I've been working on a configurable tool that finds files by a certain specification and returns a list. I attempted to approach this in a more LINQ way and tried using appropriate design decisions, all while keeping it reasonably efficient.
Here is my Finder
class:
public class Finder
{
public List<string> Extensions;
public List<string> TargetDirectories;
public bool Recursive { get; set; }
private string _currentPath;
private List<string> _files = new List<string>();
public Finder()
{
Extensions = new List<string>();
TargetDirectories = new List<string>();
_currentPath = Directory.GetCurrentDirectory();
}
public string[] GetFiles()
{
if (_files.Count != 0)
return _files.ToArray();
return new string[0];
}
public void Find()
{
try
{
foreach (var directory in TargetDirectories)
{
Search(directory);
}
}
catch (Exception e)
{
Console.WriteLine(e.Message);
}
}
private void Search(string path)
{
var pathPoint = new DirectoryInfo($"{_currentPath}{path}");
var files = Directory.EnumerateFiles(pathPoint.FullName, "*.*")
.Where(s => Extensions.Contains(Path.GetExtension(s)))
.ToList();
files.ForEach(_files.Add);
if (Recursive)
{
var subDirectories = Directory.GetDirectories(pathPoint.FullName);
foreach (var subDir in subDirectories)
{
Search($"{path}/{new DirectoryInfo(subDir).Name}");
}
}
}
}
And here's the usage:
var finder = new Finder();
finder.Extensions.Add(".cs");
finder.Extensions.Add(".dll");
finder.TargetDirectories.Add("/src");
finder.TargetDirectories.Add("/assembly");
finder.Recursive = true;
finder.Find();
foreach (var file in finder.GetFiles())
{
Console.WriteLine(file);
}
The Output
It returns all of the files within every folder of /src
and /assembly
. If I change Recursive
to false or omit it, the result will be only the files within the targeted directories. Extensions are also specified to retrieve files only of that type.
Here's what the output looks like:
/long/path/name/src/helloworld.cs
/long/path/name/src/class.cs
/long/path/name/src/subdirectory/anotherclass.cs
/long/path/name/assembly/library.dll
This is the intended return.
Design
I store the directory information so I can scale that information and use fields like Name
and FullName
. The files for specifically that searched directory are inserted into a list and allocated to a returnable list. The return list, _files
, is publicly immutable.
Efficiency
I'm unsure if there's any notable difference between using the LINQ in my Search
method or using a for
loop for iteration in terms of performance. I also believe there might be a way I can simply use the EnumerateFiles
return array without converting to a list then back to an array when using GetFiles()
.
Any suggestions on how I can improve my class? I'm trying to better my C# knowledge and program; any feedback would be considerably helpful and will assist me in further implementing features.
EDIT: I am adopting suggestions actively. You can see the (amateur) repository here: https://github.com/TGibsonn/FastCompile
5 Answers 5
Your implementation does not need be recursive because another overload of the Directory.EnumerateFiles Method (String, String, SearchOption)
already can do that:
Returns an enumerable collection of file names that match a search pattern in a specified path, and optionally searches subdirectories.
As far as the API of the Finder
is concerned I find it is not user-friendly. The main method should have this signature that requires a criteria parameter:
public interface IFileFinder
{
public IEnumerable<string> FindFiles(FindFilesCriteria criteria);
}
This parameter could have an interface like this one where you can pass it directory name, file extensions and whether it should search recursively:
public interface IFindFilesCriteria
{
List<string> DirectoryNames { get; }
List<string> FileExtensions { get; }
bool SearchSubdirectories { get; }
}
This would be all you need. If you want to have a more convenient fluent API then you should write a FileFinderbuilder
that will allow you to construct the parameters and call the FindFiles
method eventually:
var files = fileFinder.FindFiles(new FindFilesCriteria
{
DirectoryNames = { "/src", "/assembly" },
FileExtensions = { ".cs", ".dll" },
SearchSubdirectories = true
});
This API would allow you to store and reuse the FindFilesCriteria
and also to mock the IFileFinder
interface for unit-testing.
-
\$\begingroup\$ In regards to the
SearchOption
: I had used this originally but decided to follow the advice in stackoverflow.com/questions/19354625/… but I'm not entirely sure. I'll have to weigh in the options more. \$\endgroup\$Tristan Gibson– Tristan Gibson2017年08月13日 13:33:12 +00:00Commented Aug 13, 2017 at 13:33 -
1\$\begingroup\$ @TristanGibson mhmm and yet you don't use any of the apparent advantages of your own recursion ;-] \$\endgroup\$t3chb0t– t3chb0t2017年08月13日 13:39:06 +00:00Commented Aug 13, 2017 at 13:39
-
\$\begingroup\$ This provides me with a lot of insight on how I can better scale it and provide more usability as something like a library. Not only that, but I think I can better maintain it as a whole with how it's being split into separate, distinguishable components. Thank you for the feedback. :) I'll likely add logic to how it's being used recursively later; or just make it an optional thing (a boolean for some sort of logic that could be implemented). \$\endgroup\$Tristan Gibson– Tristan Gibson2017年08月13日 20:56:06 +00:00Commented Aug 13, 2017 at 20:56
-
\$\begingroup\$ One of the benefits to using
SearchOption.AllDirectories
is that the iterator created byEnumerateFiles
is lazy: it scans as you consume the results. That's one of the core tenets of LINQ: don't do more work up front than you actually need to. OP's original implementation would have traversed the entire search space recursively before returning any results. A lazy scan uses much less memory, and the consumer can start processing the results right away. The I/O costs are spread out across the search space, so if the consumer stops early, you won't do more work than necessary. \$\endgroup\$Mike Strobel– Mike Strobel2017年09月06日 21:57:53 +00:00Commented Sep 6, 2017 at 21:57 -
\$\begingroup\$ @MikeStrobel I don't think this is right. The
SearchOption
has nothing to do with the lazyness. It just specifies howGetFiles
orEnumerateFiles
should behave and the latter one is lazy, not the option, the method.EnumerateFiles
is lazy despite theSearchOption
. \$\endgroup\$t3chb0t– t3chb0t2017年09月07日 04:39:35 +00:00Commented Sep 7, 2017 at 4:39
As someone who's been writing cross-platform tools on .Net Core, this doesn't do what I expect it to do.
finder.TargetDirectories.Add("/src");
I expect that to target a directory named src
at the root of the file system. On Linux: /src
; on windows C:\src
. It would be more clear if the user passed in a relative file path or a complete file path.
Also, please don't concatenate file paths this way.
$"{path}/{new DirectoryInfo(subDir).Name}"
This is terribly error prone. It's hard to get right the first time and it's even harder to refactor later. Always use System.IO.Path.Combine()
.
-
\$\begingroup\$ I've found that I can use
var pathPoint = new DirectoryInfo(path);
so that the/
is not needed when adding directories to the target directories list..Add("src")
would then work. It'll automatically parse the directional slash.Path.Combine(path, subDir))
works great for the recursiveSearch
parameter. \$\endgroup\$Tristan Gibson– Tristan Gibson2017年08月13日 16:53:46 +00:00Commented Aug 13, 2017 at 16:53
I would return IEnumerable<string>
from GetFiles
. It is more general approach. User will decide then what collection he want – array, List
or something else. You even don't need to check whether _files
contains elements or not.
public IEnumerable<string> GetFiles()
{
return _files.AsReadOnly();
}
Also AsReadOnly
will wrap your internal List
to readonly collection preventing user from changing it. And then if user want to use array he will just call ToArray
on returned value.
At now user can set TargetDirectories
and Extensions
to null
. I think it is bad. It is twice bad since you don't check these fields on null
when use them. I recommend you to turn TargetDirectories
and Extensions
into readonly properties like that:
public List<string> Extensions { get; } = new List<string>();
public List<string> TargetDirectories { get; } = new List<string>();
You even don't need to initialize them in constructor.
Also I would go away from TargetDirectories
and Extensions
methods. In fact user don't need all the methods and properties of List
. I would create methods like
public void AddTargetDirectories(params string[] directories)
{
if (directories == null)
throw new ArgumentNullException(nameof(directories));
_targetDirectories.AddRange(directories);
}
public void AddExtensions(params string[] extensions)
{
if (extensions == null)
throw new ArgumentNullException(nameof(extensions));
_extensions.AddRange(extensions);
}
It will also simplify usage of your API:
var finder = new Finder();
finder.AddExtensions(".cs", ".dll");
finder.AddTargetDirectories("/src", "/assembly");
finder.Recursive = true;
finder.Find();
-
\$\begingroup\$ Might I ask: What's the purpose of the
params string[]
check for null within theAddTargetDirectories
andAddExtensions
? If I leave theAddExtensions
, for example, parameters empty it won't get null. Would it be better to checkif (extensions.Length == 0)
, then throw an ArgumentException? Just curious on this approach. \$\endgroup\$Tristan Gibson– Tristan Gibson2017年08月13日 16:22:53 +00:00Commented Aug 13, 2017 at 16:22 -
\$\begingroup\$
params
don't do magic preventing your from passing a null array. It just lets you to specify arguments without wrapping them into an array manually. But the type of parameter marked withparams
keyword is array. Simple example of how you can get null array as an input argument:AddTargetDirectories(null)
. Here null will be treated as value of the entire input array and not a null string. You should always check input arguments on null in public API if this can cause internal exceptions. \$\endgroup\$Maxim– Maxim2017年08月13日 23:21:46 +00:00Commented Aug 13, 2017 at 23:21
Potentially confusing API
You should consider the implications of repeated results appearing if you have not. It is also possible to call Find()
twice, which doesn't clear the results. These are design decisions, but matters that are not documented or necessarily the expected behaviour, and are certainly things that should be tested. Indeed, it is not apparent that one must call Search
before calling GetFiles()
.
Personally, I would remove GetFiles()
, and just keep Find()
; if the caller wants to cache the values they can do so. This would work well with T3chb0t's suggestion of pulling out the search criteria, allowing the Finder
class to become stateless (if you also remove _currentDirectory
which I discuss below), reducible to a static method, nullifying any concerns about misuse or thread-safety (which while not always a requirement is generally worth keeping in mind and documenting, if the code is implicitly stateless), which will make it a more appealing API.
Code
GetFiles()
Others have suggested sensible changes to this already; I'll just add that you could make it a property (since once you call Search
it is essentially constant), and returning IReadOnlyList<string>
may be another option.
Note, however, that your current code has the nice feature of returning an expression that can't be changed by surprise. If you return a projection of the same object (which casting or calling AsReadOnly()
would do) then the Finder
may modify it, and these changes will appear elsewhere.
Such a concern would be a perfectly legitimate reason to clone the list. I would suggest an implementation more like this:
/// <summary> Returns the list of files found by Search </summary>
/// <returns> A cloned copy of the list that is not subject to further modification </returns>
public IReadOnlyList<string> GetFiles()
{
return _files.ToArray(); // no need to check if it is empty
}
Note that ToArray()
works just fine with an empty list. Inline documentation (\\\
) on any public facing members is always appreciated.
Find()
This is a confusingly named method, it is not clear what it does. PopulateFileList
or similar would better express the usage of calling Find()
and then GetFiles()
. I won't comment on the nature of this usage itself (other's have done a good job already).
I would add some inline documentation explaining that you can get the result of the Search
by calling GetFiles()
.
Exceptions
Find()
is silently swallowing exceptions (i.e. not rethrowing, Console output doesn't count). I subscribe wholly to the 'fail fast' doctrine: if your code crashes for a reason you aren't expecting, then let it crash now rather than later, and let it tell the world. Why should Search(string)
crash? More importantly, what do your users want you to do when it crashes? I'll bet they don't want you to mess up their Console output, and continue running having not successfully completed the task your code was asked to do!
Error handling is always a design decision, I'm just outlining a few possibilities, you need to design how you want the API to be used, document this, and then implement it. Note that if you do make the methods stateless as suggest above, then the issues with inconsistent state just go away, which is ideal.
You should either be catching specific exceptions (e.g. directory not found) and documenting your behaviour thereabout (e.g. Find()
will attempt to search every directory, but will ignore any that do not exist), or not trying at all. Currently, the behaviour is "keep searching until something goes wrong, and then quietly stop". It is absolutely fine for your API to have the behaviour "if an exception is thrown, then the system enters an inconsistent state, and the calling code can deal with it" (after all, either the calling code caused the exception through misuse, or your code is broken, and you want everyone to hear about it so that you can fix it!).
You could catch exceptions, and then attempt to return the Finder
to a consistent state, before indicating the failure to the calling code. For example, in your catch
, you could clear _files
, and then rethrow the exception. This could trivially be documented and understood by the user "If an exception occurs during a call to Find()
, then the Finder
will clear its list of files, and rethrow the exception".
Search(string)
PathPoint
This seems an unnecessary and fragile use of string interpolation:
var pathPoint = new DirectoryInfo($"{_currentPath}{path}");
If I were joining 2 strings, I'd just +
them, much more obvious (I see no issue with +
being type dependant; type coercion is a problem when misused, not intrinsically evil, certainly not with a static type checker watching over you). Even so, there is a better way of assembling paths, Path.Combine
.
var pathPoint = Path.Combine(_currentPath, path);
This method can be platform agnostic, so can use \
on Windows and /
on Linux, and will automatically insert separators where necessary: combining "C:\MyDir" with "file.txt" gives "C:\MyDir\file.txt".
This also has the incidental benefit of allowing your program to accept full paths, which currently it cannot, because you pre-pend the _currentPath
to anything, owing to the specific behaviour of Path.Combine(string, string)
when presented with a fully qualified path on the right hand side (under windows, at least). This is a use-case you should think about.
_currentPath
This method depends on _currentPath
, which is (silently) assigned when the class is constructed. What is stopping someone creating the Finder
, changing working directory, and then calling Find()
? Better would be to determine the working directory when you call Find()
. This is less confusing (people are used to stuff using the current working directory, rather than caching one). Alternatively, document this and consider making _currentDirectory
configurable.
Personally, I would remove _currentDirectory
completely: the methods in System.IO
all allow you to provide a partial path to which it prepends the working directory, so you aren't gaining anything by doing it yourself (this would remove the issue described above also when providing fully qualified paths).
DirectoryInfo.Name
You can use DirectoryInfo.FullName
to get the full path, rather than assembling it yourself (a property you are already using elsewhere). But you already have this information, because GetDirectories(string)
returns the full paths already!
files.ForEach(_files.Add);
This can be rewritten as _files.AddRange(files) which I think is clearer. This also allows you to keep files
as an IEnumerable<string>
rather than calling .ToList()
on it, as List<T>.AddRange(IEnumerable<T>)
helpfully takes an IEnumerable<>
. This will be more efficient, but you shouldn't worry about performance until you have a reason to.
Create a new model for Filter Search Criteria
public class FileSearchCriteria
{
public string[] Extensions { get; set; } = new string[0];
public string BaseDirectory { get; set; }
public string[] TargetDirectories { get; set; } = new string[0];
public IEnumerable<string> AbsoluteTargetDirectories
{
get
{
return TargetDirectories?
.Select(td => Path.Combine(BaseDirectory, td))
.Where(td => !string.IsNullOrWhiteSpace(td) && Directory.Exists(td));
}
}
public bool Recursive { get; set; }
public SearchOption SearchOption
{
get
{
return Recursive ? SearchOption.AllDirectories : SearchOption.TopDirectoryOnly;
}
}
}
FileSearchCriteria
class have below two calculated Properties:
SearchOption
- Converts the Recursive boolean to SearchOption filterAbsoluteTargetDirectories
- Converts the relative paths to Absolute Paths
Simplified finder class with LINQ. This logic will support with multiple directories and extensions.
public class Finder
{
public IEnumerable<string> Find(FileSearchCriteria criteria)
{
return criteria.AbsoluteTargetDirectories.SelectMany(td => FindFiles(td, criteria.Extensions, criteria.SearchOption));
}
public IEnumerable<string> FindFiles(string folderPath, string[] extensions, SearchOption searchOption)
{
return Directory.GetFiles(folderPath, "*.*", searchOption)
.Where(s =>
!extensions.Any() ||
extensions.Any(e => string.Compare(e, Path.GetExtension(s), true) == 0)
);
}
}
Finder class is consuming the calculated properties created in FileSearchCriteria class.
Now, we can call the method like below
var searchCriteria = new FileSearchCriteria {
Extensions = new string[]{ ".cs", ".dll" },
BaseDirectory = Directory.GetCurrentDirectory(),
TargetDirectories = new string[] { "src", "assembly" },
Recursive = false,
};
var result = new Finder().Find(searchCriteria);
-
-
\$\begingroup\$ Added additional notes to my answer. Calling the method might looks same. But updated the filter contract and Finder class logic. \$\endgroup\$Narendra Bhogavalli– Narendra Bhogavalli2017年08月13日 19:00:28 +00:00Commented Aug 13, 2017 at 19:00
-
1\$\begingroup\$ Added BaseDirectory to the Filter model. Now consumer of the class can pass the base path. Thank You for your feedback. \$\endgroup\$Narendra Bhogavalli– Narendra Bhogavalli2017年08月13日 19:18:42 +00:00Commented Aug 13, 2017 at 19:18