I have the following code snippet where I'm picking up a file from an FTP server and then streaming it to OpenCSV. I have very little experience with streams, so I'd like to have someone review this code for efficiency. I use this on the cloud, so I/O and memory efficiency is important. Generally my CSV files consist of a few hundred thousand records each.
public InputStream getData(ImportTask importTask) throws IOException {
//FTP Connection code
try {
//enter passive mode
ftp.enterLocalPassiveMode();
if (!ftp.setFileType(FTP.BINARY_FILE_TYPE)) {
System.out.println("Setting binary file type failed.");
}
ByteArrayOutputStream output = new ByteArrayOutputStream();
String file = importTask.isZipfile() ? importTask.getZipFilename() : importTask.getFilename();
ftp.retrieveFile(file, output);
InputStream inputStream = new ByteArrayInputStream(output.toByteArray());
if (importTask.isZipfile()) {
inputStream = importUtils.getZipData(new ZipInputStream(inputStream), importTask.getFilename());
}
ftp.logout();
ftp.disconnect();
return inputStream;
} catch (FileNotFoundException ex) {
throw new Exception(ex);
}
}
OpenCSV to POJO
InputStream is;
try {
is = dataSource.getData(importTask);
} catch (IOException ex) {
logger.error(ex.getMessage());
{
}
final HeaderColumnNameTranslateMappingStrategy<MyObject> strategy = new HeaderColumnNameTranslateMappingStrategy<>();
strategy.setType(MyObject.class);
strategy.setColumnMapping(columnMappings);
final CsvToBean csv = new CsvToBean();
List list = csv.parse(strategy, new InputStreamReader(is));
for (Object object : list) {
MyObject myObject = (MyObject) object;
//Do something
}
1 Answer 1
Switch to uniVocity-parsers as it is twice as fast as OpenCSV and the input is read in parallel.
final BeanListProcessor<MyObject > clientProcessor = new BeanListProcessor<MyObject>(MyObject .class);
CsvParserSettings settings = new CsvParserSettings();
settings.getFormat().setLineSeparator("\n");
settings.setRowProcessor(clientProcessor);
settings.setReadInputOnSeparateThread(true); // this enabled by default
CsvParser parser = new CsvParser(settings);
parser.parse(new InputStreamReader(new ZipInputStream(inputStream)));
List<MyObject> rows = clientProcessor.getBeans();
If you are not using all columns of your CSV input, you can select the rows you want as well to make the process even faster:
settings.selectIndexes(4, 6, 3);
settings.selectFields("Field A", "B", "and C");
Disclosure: I am the author of this library. It's open-source and free (Apache V2.0 license).
-
\$\begingroup\$ you should use var on the left hand side of assignations as the type is easily discernable from the right hand side. \$\endgroup\$aydjay– aydjay2015年06月09日 09:08:34 +00:00Commented Jun 9, 2015 at 9:08