I like python's os.walk
and I was missing it in C#. It's a very convenient API so I thought I create my own utility that does something similar.
The original API uses tuples as return values and slightly different names. I adjusted them to match C# conventions. Instead of returning a tuple which would be difficult to override/mock in a test I used an interface. Also the method signature would be very long and incovenient.
It works with a single loop that for each directory creates a DirectoryTreeNode
s that in turn can enumerate directories and files for that directory. It can also be turned into a tuple which an extension makes possible. I used a Queue
and not a Stack
because the latter would change the order to descending per directory. There is a very simple exception handling that passes any exception to the caller and let's it decide how to handle it. Alternatively the WalkSilently
extension ignores them.
I'm going to use it primarily for searching for directories and/or files and replace an older API.
For now, it doesn't have to behave exactly like the python's API does; just the basic stuff. I don't know what to think of symbolic links yet.
public interface IDirectoryTree
{
[NotNull, ItemNotNull]
IEnumerable<IDirectoryTreeNode> Walk([NotNull] string path, [NotNull] Action<Exception> onException);
}
public class DirectoryTree : IDirectoryTree
{
public IEnumerable<IDirectoryTreeNode> Walk(string path, Action<Exception> onException)
{
if (path == null) throw new ArgumentNullException(nameof(path));
if (onException == null) throw new ArgumentNullException(nameof(onException));
var nodes = new Queue<DirectoryTreeNode>
{
new DirectoryTreeNode(path)
};
while (nodes.Any())
{
var current = nodes.Dequeue();
yield return current;
try
{
foreach (var directory in current.DirectoryNames)
{
nodes.Enqueue(new DirectoryTreeNode(Path.Combine(current.DirectoryName, directory)));
}
}
catch (Exception inner)
{
onException(inner);
}
}
}
}
[PublicAPI]
public interface IDirectoryTreeNode
{
[NotNull]
string DirectoryName { get; }
[NotNull, ItemNotNull]
IEnumerable<string> DirectoryNames { get; }
[NotNull, ItemNotNull]
IEnumerable<string> FileNames { get; }
}
internal class DirectoryTreeNode : IDirectoryTreeNode
{
internal DirectoryTreeNode(string path)
{
DirectoryName = path;
}
public string DirectoryName { get; }
public IEnumerable<string> DirectoryNames => Directory.EnumerateDirectories(DirectoryName).Select(Path.GetFileName);
public IEnumerable<string> FileNames => Directory.EnumerateFiles(DirectoryName).Select(Path.GetFileName);
}
Additional functionality like turning a node into a tuple or checking for the existance of it is provided by extensions:
public static class DirectoryTreeNodeExtensions
{
public static void Deconstruct(
[CanBeNull] this IDirectoryTreeNode directoryTreeNode,
[CanBeNull] out string directoryName,
[CanBeNull] out IEnumerable<string> directoryNames,
[CanBeNull] out IEnumerable<string> fileNames)
{
directoryName = directoryTreeNode?.DirectoryName;
directoryNames = directoryTreeNode?.DirectoryNames;
fileNames = directoryTreeNode?.FileNames;
}
public static bool Exists(
[CanBeNull] this IDirectoryTreeNode directoryTreeNode)
{
// Empty string does not exist and it'll return false.
return Directory.Exists(directoryTreeNode?.DirectoryName ?? string.Empty);
}
}
In case you are wondering how the Queue
initialzer works, I have a helper extension for it:
public static class QueueExtensions
{
public static void Add<T>(this Queue<T> queue, T item)
{
queue.Enqueue(item);
}
}
Example 1 - Basic usage
The usage is very easy. Create or inject the DirectoryTree
and call Walk
or WalkSilently
:
Used directly:
var directoryTree = new DirectoryTree();
directoryTree
.WalkSilently(@"c:\temp")
.Where(n => !n.DirectoryName.Contains(".git"))
.Take(100)
.Select(node => node.DirectoryName)
.Dump();
or with a loop:
foreach (var (dirpath, dirnames, filenames) in directoryTree.WalkSilently(@"c:\temp").Where(n => !n.DirectoryName.Contains(".git")).Take(10))
{
filenames.Dump();
}
Filtering
I've added a couple of extensions for filtering the results. They work by creating a DirectoryTreeNodeFilter
that contains the linq expression selecting or skipping files:
internal class DirectoryTreeNodeFilter : IDirectoryTreeNode
{
internal DirectoryTreeNodeFilter(string path, IEnumerable<string> directoryNames, IEnumerable<string> fileNames)
{
DirectoryName = path;
DirectoryNames = directoryNames;
FileNames = fileNames;
}
public string DirectoryName { get; }
public IEnumerable<string> DirectoryNames { get; }
public IEnumerable<string> FileNames { get; }
}
and here are the extensions (there's also the WalkSilently
method):
public static class DirectoryTreeExtensions
{
[NotNull, ItemNotNull]
public static IEnumerable<IDirectoryTreeNode> WalkSilently([NotNull] this IDirectoryTree directoryTree, [NotNull] string path)
{
if (directoryTree == null) throw new ArgumentNullException(nameof(directoryTree));
if (path == null) throw new ArgumentNullException(nameof(path));
return directoryTree.Walk(path, _ => { });
}
[NotNull, ItemNotNull]
public static IEnumerable<IDirectoryTreeNode> SkipDirectories([NotNull] this IEnumerable<IDirectoryTreeNode> nodes, [NotNull][RegexPattern] string directoryNamePattern)
{
if (nodes == null) throw new ArgumentNullException(nameof(nodes));
if (directoryNamePattern == null) throw new ArgumentNullException(nameof(directoryNamePattern));
return
from node in nodes
where !node.DirectoryName.Matches(directoryNamePattern)
select new DirectoryTreeNodeFilter
(
node.DirectoryName,
from dirname in node.DirectoryNames where !dirname.Matches(directoryNamePattern) select dirname,
node.FileNames
);
}
[NotNull, ItemNotNull]
public static IEnumerable<IDirectoryTreeNode> SkipFiles([NotNull] this IEnumerable<IDirectoryTreeNode> nodes, [NotNull][RegexPattern] string fileNamePattern)
{
if (nodes == null) throw new ArgumentNullException(nameof(nodes));
if (fileNamePattern == null) throw new ArgumentNullException(nameof(fileNamePattern));
return
from node in nodes
select new DirectoryTreeNodeFilter
(
node.DirectoryName,
node.DirectoryNames,
from fileName in node.FileNames where !fileName.Matches(fileNamePattern) select fileName
);
}
[NotNull, ItemNotNull]
public static IEnumerable<IDirectoryTreeNode> WhereDirectories([NotNull] this IEnumerable<IDirectoryTreeNode> nodes, [NotNull][RegexPattern] string directoryNamePattern)
{
if (nodes == null) throw new ArgumentNullException(nameof(nodes));
if (directoryNamePattern == null) throw new ArgumentNullException(nameof(directoryNamePattern));
return
from node in nodes
where node.DirectoryName.Matches(directoryNamePattern)
select new DirectoryTreeNodeFilter
(
node.DirectoryName,
from dirname in node.DirectoryNames where dirname.Matches(directoryNamePattern) select dirname,
node.FileNames
);
}
[NotNull, ItemNotNull]
public static IEnumerable<IDirectoryTreeNode> WhereFiles([NotNull] this IEnumerable<IDirectoryTreeNode> nodes, [NotNull][RegexPattern] string fileNamePattern)
{
if (nodes == null) throw new ArgumentNullException(nameof(nodes));
if (fileNamePattern == null) throw new ArgumentNullException(nameof(fileNamePattern));
return
from node in nodes
select new DirectoryTreeNodeFilter
(
node.DirectoryName,
node.DirectoryNames,
from fileName in node.FileNames
where fileName.Matches(fileNamePattern)
select fileName
);
}
private static bool Matches(this string name, [RegexPattern] string pattern)
{
return Regex.IsMatch(name, pattern, RegexOptions.IgnoreCase);
}
}
Example 2 - Extensions
The extensions can be chained very easily to find what we need:
var directoryTree = new DirectoryTree();
directoryTree
.WalkSilently(@"c:\temp")
.SkipDirectories("\\.git")
//.SkipFiles("\\.(cs|dll)$")
.WhereDirectories("Project")
.WhereFiles("\\.cs$")
.Dump();
There is very little magic here but maybe still room for improvements?
2 Answers 2
Your code is clean, straightforward, and easy to follow. There is very little to comment on here.
1) You should consider using braces always. You don't have to, but they will save you a bug at some point (or you'll have to fix someone else's bug introduced into your code).
2) If you expect a result, enforce that result in your code:
// Empty string does not exist and it'll return false. return Directory.Exists(directoryTreeNode?.DirectoryName ?? string.Empty);
That should be:
return directoryTreeNode?.DirectoryName == null
? false
: Directory.Exists(directoryTreeNode.DirectoryName)
Now you aren't relying on .NET's behavior not changing, and your code is explicit about what it does instead of needing a comment. For example, (it 99.99999% won't happen) .NET could 5 years down the road change Directory
s behaviour so the empty string always returns the current directory and this could start returning true
for empty string.
3) If this code is performance critical, consider using loops instead of Linq in the implementation. Linq sets up quite the complex state machine, and while it enables prettier code, it does slow down hot paths noticeably.
4) I don't have a better name, but WalkSilently
didn't immediately suggest that it just swallowed exceptions. What I would do instead is just have Walk(string path, Action<Exception> onException = null)
and only execute the action if one was passed in:
catch (Exception inner)
{
if (onException != null)
{
onException(inner);
}
}
-
\$\begingroup\$ As a matter of fact, null-checks is the only place where I usually don't use braces - I think I should write my conventions down ;-) \$\endgroup\$t3chb0t– t3chb0t2018年11月12日 17:57:16 +00:00Commented Nov 12, 2018 at 17:57
-
\$\begingroup\$ I know how to refactor the ugly
WalkSilently
- I try to avoid null-able parameters so I'll create a new property likepublic static Action<Exception> IgnoreExceptions { get; } = _ => { };
and use this as a parameter toWalk
. \$\endgroup\$t3chb0t– t3chb0t2018年11月12日 18:01:41 +00:00Commented Nov 12, 2018 at 18:01 -
\$\begingroup\$ Good ideas. I like that
IgnoreExceptions
one a lot, since I don't likenull
myself :) \$\endgroup\$user34073– user340732018年11月12日 18:09:15 +00:00Commented Nov 12, 2018 at 18:09
Among other points I like Hosch250's critique about the WalkSilently
method. It wasn't a good idea. I've removed this API and replaced it with a new static property of the DirectoryTree
. I think now it's much cleaner.
public static Action<Exception> IgnoreExceptions { get; } = _ => { };
While using it in some tools, I noticed that in some cases where I don't need a deep search, the walking operation is too slow. In order to skip deeper paths I added one more parameter:
public static Func<IDirectoryTreeNode, bool> MaxDepth(int maxDepth) => node => node.Depth < maxDepth; public static Func<IDirectoryTreeNode, bool> Unfiltered { get; } = _ => true;
This allows me to use a max depth for the walk.
public interface IDirectoryTree
{
[NotNull, ItemNotNull]
IEnumerable<IDirectoryTreeNode> Walk([NotNull] string path, Func<IDirectoryTreeNode, bool> predicate, [NotNull] Action<Exception> onException);
}
public class DirectoryTree : IDirectoryTree
{
public static Action<Exception> IgnoreExceptions { get; } = _ => { };
public static Func<IDirectoryTreeNode, bool> Unfiltered { get; } = _ => true;
/// <summary>
/// Specifies the max depth of the directory tree. The upper limit is exclusive.
/// </summary>
public static Func<IDirectoryTreeNode, bool> MaxDepth(int maxDepth) => node => node.Depth < maxDepth;
public IEnumerable<IDirectoryTreeNode> Walk(string path, Func<IDirectoryTreeNode, bool> predicate, Action<Exception> onException)
{
if (path == null) throw new ArgumentNullException(nameof(path));
if (onException == null) throw new ArgumentNullException(nameof(onException));
var nodes = new Queue<DirectoryTreeNode>
{
new DirectoryTreeNode(path)
};
while (nodes.Any() && nodes.Dequeue() is var current && predicate(current))
{
yield return current;
try
{
foreach (var directory in current.DirectoryNames)
{
nodes.Enqueue(new DirectoryTreeNode(Path.Combine(current.DirectoryName, directory), current.Depth + 1));
}
}
catch (Exception inner)
{
onException(inner);
}
}
}
}
I now call it like this:
directoryTree
.Walk(
@"c:\path",
DirectoryTree.MaxDepth(1),
DirectoryTree.IgnoreExceptions);
-
\$\begingroup\$ what a beast :O while (nodes.Any() && nodes.Dequeue() is var current && predicate(current)) \$\endgroup\$dfhwze– dfhwze2019年08月30日 20:23:00 +00:00Commented Aug 30, 2019 at 20:23
-
\$\begingroup\$ @dfhwze haha without this cool syntax I would need to create a helper variable outside of the loop. This would be as ugly as an else if and I wold need another condition for dequeue. This beast would turn into a monster. \$\endgroup\$t3chb0t– t3chb0t2019年08月30日 20:29:24 +00:00Commented Aug 30, 2019 at 20:29
-
\$\begingroup\$ Perhaps you enter the room with a swinging katana? :p (I just saw kill bill) \$\endgroup\$dfhwze– dfhwze2019年08月30日 20:43:57 +00:00Commented Aug 30, 2019 at 20:43
-
\$\begingroup\$ haha can't you handle the crazy 88 alone? \$\endgroup\$dfhwze– dfhwze2019年08月30日 20:46:22 +00:00Commented Aug 30, 2019 at 20:46
-
1\$\begingroup\$ see you at 05:00 I guess \$\endgroup\$dfhwze– dfhwze2019年08月30日 20:49:10 +00:00Commented Aug 30, 2019 at 20:49