I got thousands of files with a specific file extension in thousands of sub folders. Now, what is the fastest way to search with a pattern? I tried the method DirectoryInfo.GetFiles(rootfolder)
(~8 minutes) and a recursive custom method (~5 minutes).
private void WalkDirectoryTree(DirectoryInfo dr, string searchname)
{
System.IO.FileInfo[] files = null;
System.IO.DirectoryInfo[] subDirs = null;
try
{
files = dr.GetFiles(searchname + ".*");
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
if (files != null)
{
foreach (FileInfo fi in files)
{
allFiles.Add(fi);
}
subDirs = dr.GetDirectories();
foreach (DirectoryInfo di in subDirs)
{
WalkDirectoryTree(di, searchname);
}
}
}
Is there any faster way to do it?
2 Answers 2
Try parallelization. Instead of:
foreach (DirectoryInfo di in subDirs) { WalkDirectoryTree(di, searchname); }
Do
Parallel.ForEach(subDirs, dir => WalkDirectoryTree(dir, searchname));
Notice that by doing this allFiles
will be accessed concurrently so change your collection to a ConcurrentBag.
-
\$\begingroup\$ @greenhoorn I would be glad to hear the improvements you had after the replacement. \$\endgroup\$Bruno Costa– Bruno Costa2014年12月19日 13:45:17 +00:00Commented Dec 19, 2014 at 13:45
-
\$\begingroup\$ I will share it as soon as I test it. Could take 2 weeks :-) \$\endgroup\$greenhoorn– greenhoorn2014年12月19日 14:22:16 +00:00Commented Dec 19, 2014 at 14:22
-
1\$\begingroup\$ Thanks for your answer! It still takes 2 minutes but that's significantly faster than before. I would say 2 minutes for millions of files is acceptable :-) \$\endgroup\$greenhoorn– greenhoorn2015年01月09日 09:41:18 +00:00Commented Jan 9, 2015 at 9:41
You can just use DirectoryInfo.EnumerateFiles()
method which returns an IEnumerable<FileInfo>
and therefor if you access them by the enumerator they will be evaluated when they are accessed.
private void WalkDirectoryTree(DirectoryInfo dr, string searchname)
{
foreach (FileInfo file in FindFiles(dr, searchname + ".*"))
{
// process file
allFiles.Add(file);
}
}
public IEnumerable<FileInfo> FindFiles(DirectoryInfo startDirectory, string pattern)
{
return startDirectory.EnumerateFiles(pattern, SearchOption.AllDirectories);
}
-
\$\begingroup\$ So the method GetFiles() does some more processing than EnumerateFiles(), do I get this right? I will give this a try too, thanks. \$\endgroup\$greenhoorn– greenhoorn2014年12月19日 12:07:25 +00:00Commented Dec 19, 2014 at 12:07
-
\$\begingroup\$ No, but EnumerateFiles returns the single FileInfo's only if they are accessed (foreach... or calling .ToList()) \$\endgroup\$Heslacher– Heslacher2014年12月19日 12:08:40 +00:00Commented Dec 19, 2014 at 12:08
-
\$\begingroup\$ @Heslacher Lazyness does not solve slowness. And in this scenario it will be the same as a non lazy algorithm because you are evaluating all items imediatly and putting them in a list (wich can be done. like you said, with the ToList() method). \$\endgroup\$Bruno Costa– Bruno Costa2014年12月19日 13:43:06 +00:00Commented Dec 19, 2014 at 13:43
allFiles
? Do you needFileInfo
or the filename ? \$\endgroup\$