1
\$\begingroup\$

I have the following code snippet where I'm picking up a file from an FTP server and then streaming it to OpenCSV. I have very little experience with streams, so I'd like to have someone review this code for efficiency. I use this on the cloud, so I/O and memory efficiency is important. Generally my CSV files consist of a few hundred thousand records each.

public InputStream getData(ImportTask importTask) throws IOException {
 //FTP Connection code
 try {
 //enter passive mode
 ftp.enterLocalPassiveMode();
 if (!ftp.setFileType(FTP.BINARY_FILE_TYPE)) {
 System.out.println("Setting binary file type failed.");
 } 
 ByteArrayOutputStream output = new ByteArrayOutputStream();
 String file = importTask.isZipfile() ? importTask.getZipFilename() : importTask.getFilename();
 ftp.retrieveFile(file, output);
 InputStream inputStream = new ByteArrayInputStream(output.toByteArray());
 if (importTask.isZipfile()) {
 inputStream = importUtils.getZipData(new ZipInputStream(inputStream), importTask.getFilename());
 }
 ftp.logout();
 ftp.disconnect();
 return inputStream;
 } catch (FileNotFoundException ex) {
 throw new Exception(ex);
 }
}

OpenCSV to POJO

InputStream is;
try {
 is = dataSource.getData(importTask);
 } catch (IOException ex) {
 logger.error(ex.getMessage());
 {
}
final HeaderColumnNameTranslateMappingStrategy<MyObject> strategy = new HeaderColumnNameTranslateMappingStrategy<>();
 strategy.setType(MyObject.class);
 strategy.setColumnMapping(columnMappings);
final CsvToBean csv = new CsvToBean();
List list = csv.parse(strategy, new InputStreamReader(is));
for (Object object : list) {
 MyObject myObject = (MyObject) object;
 //Do something
}
Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Feb 27, 2015 at 16:28
\$\endgroup\$

1 Answer 1

3
\$\begingroup\$

Switch to uniVocity-parsers as it is twice as fast as OpenCSV and the input is read in parallel.

 final BeanListProcessor<MyObject > clientProcessor = new BeanListProcessor<MyObject>(MyObject .class);
 CsvParserSettings settings = new CsvParserSettings();
 settings.getFormat().setLineSeparator("\n");
 settings.setRowProcessor(clientProcessor);
 settings.setReadInputOnSeparateThread(true); // this enabled by default
 CsvParser parser = new CsvParser(settings);
 parser.parse(new InputStreamReader(new ZipInputStream(inputStream)));
 List<MyObject> rows = clientProcessor.getBeans();

If you are not using all columns of your CSV input, you can select the rows you want as well to make the process even faster:

 settings.selectIndexes(4, 6, 3);
 settings.selectFields("Field A", "B", "and C");

Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

answered Jun 9, 2015 at 8:59
\$\endgroup\$
1
  • \$\begingroup\$ you should use var on the left hand side of assignations as the type is easily discernable from the right hand side. \$\endgroup\$ Commented Jun 9, 2015 at 9:08

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.