Stream file to OpenCSV with the least amount of I/O possible

Question 1

I have the following code snippet where I'm picking up a file from an FTP server and then streaming it to OpenCSV. I have very little experience with streams, so I'd like to have someone review this code for efficiency. I use this on the cloud, so I/O and memory efficiency is important. Generally my CSV files consist of a few hundred thousand records each.

public InputStream getData(ImportTask importTask) throws IOException {
 //FTP Connection code
 try {
 //enter passive mode
 ftp.enterLocalPassiveMode();
 if (!ftp.setFileType(FTP.BINARY_FILE_TYPE)) {
 System.out.println("Setting binary file type failed.");
 } 
 ByteArrayOutputStream output = new ByteArrayOutputStream();
 String file = importTask.isZipfile() ? importTask.getZipFilename() : importTask.getFilename();
 ftp.retrieveFile(file, output);
 InputStream inputStream = new ByteArrayInputStream(output.toByteArray());
 if (importTask.isZipfile()) {
 inputStream = importUtils.getZipData(new ZipInputStream(inputStream), importTask.getFilename());
 }
 ftp.logout();
 ftp.disconnect();
 return inputStream;
 } catch (FileNotFoundException ex) {
 throw new Exception(ex);
 }
}

OpenCSV to POJO

InputStream is;
try {
 is = dataSource.getData(importTask);
 } catch (IOException ex) {
 logger.error(ex.getMessage());
 {
}
final HeaderColumnNameTranslateMappingStrategy<MyObject> strategy = new HeaderColumnNameTranslateMappingStrategy<>();
 strategy.setType(MyObject.class);
 strategy.setColumnMapping(columnMappings);
final CsvToBean csv = new CsvToBean();
List list = csv.parse(strategy, new InputStreamReader(is));
for (Object object : list) {
 MyObject myObject = (MyObject) object;
 //Do something
}

Question 2

Switch to uniVocity-parsers as it is twice as fast as OpenCSV and the input is read in parallel.

 final BeanListProcessor<MyObject > clientProcessor = new BeanListProcessor<MyObject>(MyObject .class);
 CsvParserSettings settings = new CsvParserSettings();
 settings.getFormat().setLineSeparator("\n");
 settings.setRowProcessor(clientProcessor);
 settings.setReadInputOnSeparateThread(true); // this enabled by default
 CsvParser parser = new CsvParser(settings);
 parser.parse(new InputStreamReader(new ZipInputStream(inputStream)));
 List<MyObject> rows = clientProcessor.getBeans();

If you are not using all columns of your CSV input, you can select the rows you want as well to make the process even faster:

 settings.selectIndexes(4, 6, 3);
 settings.selectFields("Field A", "B", "and C");

Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

Question 3

you should use var on the left hand side of assignations as the type is easily discernable from the right hand side.

Jeronimo Backes Jeronimo Backes 1314 bronze badges · Answer 1 · 2015-06-09 08:59:11Z

Switch to uniVocity-parsers as it is twice as fast as OpenCSV and the input is read in parallel.

 final BeanListProcessor<MyObject > clientProcessor = new BeanListProcessor<MyObject>(MyObject .class);
 CsvParserSettings settings = new CsvParserSettings();
 settings.getFormat().setLineSeparator("\n");
 settings.setRowProcessor(clientProcessor);
 settings.setReadInputOnSeparateThread(true); // this enabled by default
 CsvParser parser = new CsvParser(settings);
 parser.parse(new InputStreamReader(new ZipInputStream(inputStream)));
 List<MyObject> rows = clientProcessor.getBeans();

If you are not using all columns of your CSV input, you can select the rows you want as well to make the process even faster:

 settings.selectIndexes(4, 6, 3);
 settings.selectFields("Field A", "B", "and C");

Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).

you should use var on the left hand side of assignations as the type is easily discernable from the right hand side.

Stack Exchange Network

Stream file to OpenCSV with the least amount of I/O possible

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Stream file to OpenCSV with the least amount of I/O possible

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions