I am writing a Java app that gets the metadata of files in a directory and exports it to a .csv file. The app works fine if the number of files is fewer than a million. But if I feed in a path that has about 3200000 files in all of directories and sub-directories, it takes forever. Is there a way I can speed up things here?
private void extractDetailsCSV(File libSourcePath, String extractFile) throws ScraperException {
log.info("Inside extract details csv");
try{
FileMetadataUtil fileUtil = new FileMetadataUtil();
File[] listOfFiles = libSourcePath.listFiles();
for(int i = 0; i < listOfFiles.length; i++) {
if(listOfFiles[i].isDirectory()) {
extractDetailsCSV(listOfFiles[i],extractFile);
}
if(listOfFiles[i].isFile()){
ScraperOutputVO so = new ScraperOutputVO();
Path path = Paths.get(listOfFiles[i].getAbsolutePath());
so.setFilePath(listOfFiles[i].getParent());
so.setFileName(listOfFiles[i].getName());
so.setFileType(getFileType(listOfFiles[i].getAbsolutePath()));
BasicFileAttributes basicAttribs = fileUtil.getBasicFileAttributes(path);
if(basicAttribs != null) {
so.setDateCreated(basicAttribs.creationTime().toString().substring(0, 10) + " " + basicAttribs.creationTime().toString().substring(11, 16));
so.setDateLastModified(basicAttribs.lastModifiedTime().toString().substring(0, 10) + " " + basicAttribs.lastModifiedTime().toString().substring(11, 16));
so.setDateLastAccessed(basicAttribs.lastAccessTime().toString().substring(0, 10) + " " + basicAttribs.lastAccessTime().toString().substring(11, 16));
}
so.setFileSize(String.valueOf(listOfFiles[i].length()));
so.setAuthors(fileUtil.getOwner(path));
so.setFolderLink(listOfFiles[i].getAbsolutePath());
writeCsvFileDtl(extractFile, so);
so.setFileName(listOfFiles[i].getName());
noOfFiles ++;
}
}
} catch (Exception e) {
log.error("IOException while setting up columns" + e.fillInStackTrace());
throw new ScraperException("IOException while setting up columns" , e.fillInStackTrace());
}
log.info("Done extracting details to csv file");
}
public void writeCsvFileDtl(String extractFile, ScraperOutputVO scraperOutputVO) throws ScraperException {
try {
FileWriter writer = new FileWriter(extractFile, true);
writer.append(scraperOutputVO.getFilePath());
writer.append(',');
writer.append(scraperOutputVO.getFileName());
writer.append(',');
writer.append(scraperOutputVO.getFileType());
writer.append(',');
writer.append(scraperOutputVO.getDateCreated());
writer.append(',');
writer.append(scraperOutputVO.getDateLastModified());
writer.append(',');
writer.append(scraperOutputVO.getDateLastAccessed());
writer.append(',');
writer.append(scraperOutputVO.getFileSize());
writer.append(',');
writer.append(scraperOutputVO.getAuthors());
writer.append(',');
writer.append(scraperOutputVO.getFolderLink());
writer.append('\n');
writer.flush();
writer.close();
} catch (IOException e) {
log.info("IOException while writing to csv file" + e.fillInStackTrace());
throw new ScraperException("IOException while writing to csv file" , e.fillInStackTrace());
}
}
}
-
\$\begingroup\$ Which version of java do you use ? \$\endgroup\$Marc-Andre– Marc-Andre2013年08月19日 14:45:53 +00:00Commented Aug 19, 2013 at 14:45
-
\$\begingroup\$ @Marc-Andre I use Java 7 \$\endgroup\$Nikhil Das Nomula– Nikhil Das Nomula2013年08月19日 14:53:34 +00:00Commented Aug 19, 2013 at 14:53
-
\$\begingroup\$ This link could help you docs.oracle.com/javase/tutorial/essential/io/walk.html it's a tutorial about walking a file tree using java nio \$\endgroup\$Marc-Andre– Marc-Andre2013年08月19日 14:56:04 +00:00Commented Aug 19, 2013 at 14:56
3 Answers 3
I suspect that using the old File
class of Java is the possible root problem of your solution right now. Since you're using Java 7, you should use those new classes. I've seen that you use some of them like Path
, so it shouldn't be too difficult. I don't what your class look at moment so I've changed some method base on what I'm use to do. So the class I will be using is SimpleFileVisitor
since this is the basic implementation of FileVisitor
.
So I've created a class Walker
(this is a very bad name, you should change it for something clearer for you, since I have no good idea right now) that extends SimpleFileVisitor
. The class has an attribute extractFile
that correspond to the filename of the csv. This class will have the preVisitDirectory
, visitFile
and visitFileFailed
that we will override from the FileVisitor
. I've also added your method writeCsvFileDtl
, createDate
(thanks to @unholysampler, you should read his answer too).
So the class should look like that :
import static java.nio.file.FileVisitResult.CONTINUE;
import java.io.FileWriter;
import java.io.IOException;
import java.nio.file.FileVisitResult;
import java.nio.file.Path;
import java.nio.file.SimpleFileVisitor;
import java.nio.file.attribute.BasicFileAttributes;
import java.nio.file.attribute.FileTime;
public class Walker extends SimpleFileVisitor<Path> {
private String extractFile;
public Walker(String extractFile) {
this.extractFile = extractFile;
}
@Override
public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attr)
throws IOException {
populateAndWrite(dir, attr);
return CONTINUE;
}
@Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attr) {
populateAndWrite(file, attr);
return CONTINUE;
}
@Override
public FileVisitResult visitFileFailed(Path file, IOException exc) {
//You should determine if you need this method or not
return CONTINUE;
}
private void populateAndWrite(Path file, BasicFileAttributes attr) {
ScraperOutputVO so = new ScraperOutputVO();
if (file.getParent() != null) {
so.setFilePath(file.getParent().toString());
}
if (file.getFileName() != null) {
so.setFileName(file.getFileName().toString());
}
so.setFileType(getFileType(file.toAbsolutePath().toString()));
if (attr != null) {
so.setDateCreated(createDate(attr.creationTime()));
so.setDateLastModified(createDate(attr.lastModifiedTime()));
so.setDateLastAccessed(createDate(attr.lastAccessTime()));
}
if (!attr.isDirectory()) {
so.setFileSize(String.valueOf(attr.size()));
}
so.setAuthors(fileUtil.getOwner(file));
so.setFolderLink(file.toAbsolutePath().toString());
try {
writeCsvFileDtl(extractFile, so);
} catch (IOException e) {
log.info("IOException while writing to csv file" +
e.fillInStackTrace());
throw new
ScraperException("IOException while writing to csv file" ,
e.fillInStackTrace());
}
}
private String createDate(FileTime time) {
String timeStr = time.toString();
return timeStr.substring(0, 10) + " " + timeStr.substring(11, 16);
}
private void writeCsvFileDtl(ScraperOutputVO scraperOutputVO)
throws ScraperException {
try {
FileWriter writer = new FileWriter(extractFile, true);
writer.append(scraperOutputVO.getFilePath());
writer.append(',');
writer.append(scraperOutputVO.getFileName());
writer.append(',');
writer.append(scraperOutputVO.getFileType());
writer.append(',');
writer.append(scraperOutputVO.getDateCreated());
writer.append(',');
writer.append(scraperOutputVO.getDateLastModified());
writer.append(',');
writer.append(scraperOutputVO.getDateLastAccessed());
writer.append(',');
writer.append(scraperOutputVO.getFileSize());
writer.append(',');
writer.append(scraperOutputVO.getAuthors());
writer.append(',');
writer.append(scraperOutputVO.getFolderLink());
writer.append('\n');
writer.flush();
writer.close();
} catch (IOException e) {
log.info("IOException while writing to csv file" +
e.fillStackTrace();
throw new ScraperException("IOException while writing to csv file",
e.fillInStackTrace());
}
}
}
The method populateAndWrite
is use in preVisitDirectory
and visitFile
, basically it will populate each attribute of your object ScraperOutputVO
and then send it to the the write
method. I'm not sure if you to list directories, so if you don't want to just remove preVisitDirectory
. I've added some null
check since those method can return null
if you start at the root directory.
You'll maybe need to tweak some attributes, cause I didn't had access to your fileUtils
and getFileType
, so you should test to make sure you have the same values.
To launch the class you simply need to something like :
public static void main(String args[]){
Path root = Paths.get("Path to your directory");
Walker walker = new Walker("Name of your csv file");
try {
Files.walkFileTree(root, walker);
} catch (IOException e) {
//you should handle exception here
//log.info("Problem walking the directory")
e.printStackTrace();
}
}
If you need more help just comment on the answer.
Edit: I find another amelioration that would lower the time of the execution. The problem is when you write to the file, you always create a new FileWriter
, open the file, write to it and then close it. So for each file you hit, you're opening accessing the file and closing it which is causing a major drawback on performance. By using your current writeCsvFileDtl
it took me like 170 seconds to write 100 000 entries. By leaving open the the FileWriter
writing took less than one second. You should do something like that, it is not very elegant (you could probably do something more readable or cleaner) but will enhance the performance.
import java.io.FileWriter;
import java.io.IOException;
public class Writer {
static FileWriter writer = null;
public static void openFileWriter() {
try {
writer = new FileWriter("C:/dev/file.txt", true);
} catch (IOException e) {
e.printStackTrace();
}
}
public static void writeToTheFileWithOpenEachTime(){
try {
writer.append("firstColumn");
writer.append(',');
writer.append("secondColumn");
writer.append(',');
writer.append("thirdColumn");
writer.append(',');
writer.append("fourthColumn");
writer.append(',');
writer.append("fifthColumn");
writer.append(',');
writer.append("sixthColumn");
writer.append(',');
writer.append("sevenColumn");
writer.append(',');
writer.append("eigthColumn");
writer.append(',');
writer.append("ninethColumn");
writer.append('\n');
writer.flush();
} catch (IOException e) {
e.printStackTrace();
}
}
public static void closeWriter() {
try {
writer.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
You are continually accessing the array for the current item, but never using the index for anything else. This is exactly what for-each loops are for.
for (File f : listOfFiles) {
//use f instead of listOfFiles[i]
}
The following code is full of repetition:
if(basicAttribs != null) {
so.setDateCreated(basicAttribs.creationTime().toString().substring(0, 10) + " " + basicAttribs.creationTime().toString().substring(11, 16));
so.setDateLastModified(basicAttribs.lastModifiedTime().toString().substring(0, 10) + " " + basicAttribs.lastModifiedTime().toString().substring(11, 16));
so.setDateLastAccessed(basicAttribs.lastAccessTime().toString().substring(0, 10) + " " + basicAttribs.lastAccessTime().toString().substring(11, 16));
}
Instead, make a function to perform the repeated action.
public FileTime createDate(FileTime time) {
String timeStr = time.toString();
return timeStr.substring(0, 10) + " " + timeStr.substring(11, 16));
}
There should also be a way to create a Date
form FileTime
which will let you just use SimpleDateFormat
instead of doing the string manipulation by hand.
I know this is an old post, but something you may want to consider in order to speed up the suggested solution is to avoid (if you can) the FOLLOW_LINKS
option in the code below; this improved performance quite significantly in an app I've recently developed:
EnumSet<FileVisitOption> opts = EnumSet.of(FileVisitOption.FOLLOW_LINKS);
...
Files.walkFileTree(path, opts, Integer.MAX_VALUE, visitor);
If you don't need follow links, you can simply call the simpler version of walkFileTree:
Files.walkFileTree(path,visitor);
This calls the following under the covers:
Files.walkFileTree(path, EnumSet.noneOf(FileVisitOption.class), Integer.MAX_VALUE, visitor);