Draw a directory tree

Question 1

My goal was to create a simple-to-use CLI program for drawing directory trees (what tree does on most platforms, basically). I'm submitting my 'backend' program for review because I think it makes more sense to review the actual algorithm, rather than reviewing an interface on top of it.

import os 
import re
def tree(path, indentation=4, ignore_hidden=True, regex=None):
 """Build a directory tree starting at `path`. See also
 https://en.wikipedia.org/wiki/File:Files11_directory_hierarchy.svg
 Arguments
 * path: A relative or absolute path to use as starting point
 for the tree. [type: str]
 * indentation: The amount of spaces to indent each level with.
 Defaults to 4. [type: int]
 * ignore_hidden: Ignore hidden files (starting with a period).
 Defaults to True. [type: boolean]
 * regex: If set, only matching files will be shown.
 Set to None to show all files. Defaults to None.
 [type: str / None]
 Returns
 * A directory tree [type: str]
 Example
 Assuming we're in the 'home' directory:
 >>> import tree
 >>> t = tree.tree(path="./")
 >>> t
 ./images
 foo.jpg
 bar.jpg
 ./images/personal
 spam.png
 ./documents
 eggs.docx
 >>> t = tree.tree(path="./", ignore_hidden=False)
 >>> t
 .hidden.txt
 ./images
 foo.jpg
 bar.jpg
 .hidden.jpg
 ./images/personal
 spam.png
 ./documents
 eggs.docx
 >>> t = tree.tree("./", 8, regex="(.*\.png)")
 >>> t
 ./images
 ./images/personal
 spam.png
 ./documents
 """
 structure = "\n"
 tab = " "*indentation
 if regex is None:
 for root, directories, files in os.walk(path):
 try:
 depth = root.count(os.sep)
 offset = tab * depth
 structure += offset + root + "\n"
 except OSError:
 continue
 for f in files:
 if ignore_hidden and f.startswith("."): 
 continue
 try:
 depth = os.path.join(root, f).count(os.sep)
 offset = tab * depth
 structure += offset + f + "\n"
 except OSError:
 continue
 else:
 restriction = re.compile(regex)
 for root, directories, files in os.walk(path):
 try:
 depth = root.count(os.sep)
 offset = tab * depth
 structure += offset + root + "\n"
 except OSError:
 continue
 for f in files:
 if ignore_hidden and f.startswith("."): 
 continue
 if not re.match(restriction, f):
 continue
 try:
 depth = os.path.join(root, f).count(os.sep)
 offset = tab * depth
 structure += offset + f + "\n"
 except OSError:
 continue
 structure = structure.split("\n")
 dedented = list(map(lambda e: e.replace(tab, "", 1), structure))[1:]
 return "\n".join(dedented)

I believe my code runs in \$O(n)\$ complexity and running it on my Android phone on / takes about 9 seconds:

time python -c "import tree; tree.tree('/')"
real 0m9.365s
user 0m7.390s
sys 0m1.630s

I'd appreciate any feedback regarding performance, usability, coding style, documentation, or anything else!

Question 2

According to whether regex is None, you run code segment A or B. This feels like a copy-n-paste "bolt-on" to add the regex feature.

I suggest that you may be able to refactor A & B into a common piece of code that pays attention to regex and Does The Right Thing.

You didn't publish any automated unit tests that exercise this code. Putting together some unit tests would help you to refactor while verifying that functionality is unchanged.

You have some KPIs, and you mentioned 9s wall clock time, but did not mention how many directory entries were parsed in those 9 seconds. It would be interesting to know what sort of inputs this code expects, and where a profiler says the code spends most of its time.

Question 3

Thank you for your response! I split the function into two blocks because I figured checking for an re.match in a for-loop's a real waste of time if no regex was passed in the first place. Can you recommend another way of approaching that?

Question 4

Profile it first, then optimize for regex is None or whatever. I'd expect sys calls for os.walk() to dominate the usermode CPU time that checks a regex. Anyway, consider refactoring the common tree walking code into a generator function that repeatedly yields file entries.

Question 5

In dedented you strip TABs. If you're looking to save a few CPU cycles, perhaps you'd prefer not to insert TABs in the first place. Also, replace() is more aggressive than re.compile(r'^ +') or lstrip() -- for example a filename might be "READ[4 spaces]ME.txt".

J_H J_HJ_H 41.4k3 gold badges38 silver badges157 bronze badges · Accepted Answer · 2017-08-09 06:29:43Z

According to whether regex is None, you run code segment A or B. This feels like a copy-n-paste "bolt-on" to add the regex feature.

I suggest that you may be able to refactor A & B into a common piece of code that pays attention to regex and Does The Right Thing.

You didn't publish any automated unit tests that exercise this code. Putting together some unit tests would help you to refactor while verifying that functionality is unchanged.

You have some KPIs, and you mentioned 9s wall clock time, but did not mention how many directory entries were parsed in those 9 seconds. It would be interesting to know what sort of inputs this code expects, and where a profiler says the code spends most of its time.

Thank you for your response! I split the function into two blocks because I figured checking for an re.match in a for-loop's a real waste of time if no regex was passed in the first place. Can you recommend another way of approaching that?
Profile it first, then optimize for regex is None or whatever. I'd expect sys calls for os.walk() to dominate the usermode CPU time that checks a regex. Anyway, consider refactoring the common tree walking code into a generator function that repeatedly yields file entries.
In dedented you strip TABs. If you're looking to save a few CPU cycles, perhaps you'd prefer not to insert TABs in the first place. Also, replace() is more aggressive than re.compile(r'^ +') or lstrip() -- for example a filename might be "READ[4 spaces]ME.txt".

Stack Exchange Network

Draw a directory tree

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Draw a directory tree

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions