5
\$\begingroup\$

My goal was to create a simple-to-use CLI program for drawing directory trees (what tree does on most platforms, basically). I'm submitting my 'backend' program for review because I think it makes more sense to review the actual algorithm, rather than reviewing an interface on top of it.

import os 
import re
def tree(path, indentation=4, ignore_hidden=True, regex=None):
 """Build a directory tree starting at `path`. See also
 https://en.wikipedia.org/wiki/File:Files11_directory_hierarchy.svg
 Arguments
 * path: A relative or absolute path to use as starting point
 for the tree. [type: str]
 * indentation: The amount of spaces to indent each level with.
 Defaults to 4. [type: int]
 * ignore_hidden: Ignore hidden files (starting with a period).
 Defaults to True. [type: boolean]
 * regex: If set, only matching files will be shown.
 Set to None to show all files. Defaults to None.
 [type: str / None]
 Returns
 * A directory tree [type: str]
 Example
 Assuming we're in the 'home' directory:
 >>> import tree
 >>> t = tree.tree(path="./")
 >>> t
 ./images
 foo.jpg
 bar.jpg
 ./images/personal
 spam.png
 ./documents
 eggs.docx
 >>> t = tree.tree(path="./", ignore_hidden=False)
 >>> t
 .hidden.txt
 ./images
 foo.jpg
 bar.jpg
 .hidden.jpg
 ./images/personal
 spam.png
 ./documents
 eggs.docx
 >>> t = tree.tree("./", 8, regex="(.*\.png)")
 >>> t
 ./images
 ./images/personal
 spam.png
 ./documents
 """
 structure = "\n"
 tab = " "*indentation
 if regex is None:
 for root, directories, files in os.walk(path):
 try:
 depth = root.count(os.sep)
 offset = tab * depth
 structure += offset + root + "\n"
 except OSError:
 continue
 for f in files:
 if ignore_hidden and f.startswith("."): 
 continue
 try:
 depth = os.path.join(root, f).count(os.sep)
 offset = tab * depth
 structure += offset + f + "\n"
 except OSError:
 continue
 else:
 restriction = re.compile(regex)
 for root, directories, files in os.walk(path):
 try:
 depth = root.count(os.sep)
 offset = tab * depth
 structure += offset + root + "\n"
 except OSError:
 continue
 for f in files:
 if ignore_hidden and f.startswith("."): 
 continue
 if not re.match(restriction, f):
 continue
 try:
 depth = os.path.join(root, f).count(os.sep)
 offset = tab * depth
 structure += offset + f + "\n"
 except OSError:
 continue
 structure = structure.split("\n")
 dedented = list(map(lambda e: e.replace(tab, "", 1), structure))[1:]
 return "\n".join(dedented)

I believe my code runs in \$O(n)\$ complexity and running it on my Android phone on / takes about 9 seconds:

time python -c "import tree; tree.tree('/')"
real 0m9.365s
user 0m7.390s
sys 0m1.630s

I'd appreciate any feedback regarding performance, usability, coding style, documentation, or anything else!

200_success
145k22 gold badges190 silver badges478 bronze badges
asked Aug 8, 2017 at 20:12
\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

According to whether regex is None, you run code segment A or B. This feels like a copy-n-paste "bolt-on" to add the regex feature.

I suggest that you may be able to refactor A & B into a common piece of code that pays attention to regex and Does The Right Thing.

You didn't publish any automated unit tests that exercise this code. Putting together some unit tests would help you to refactor while verifying that functionality is unchanged.

You have some KPIs, and you mentioned 9s wall clock time, but did not mention how many directory entries were parsed in those 9 seconds. It would be interesting to know what sort of inputs this code expects, and where a profiler says the code spends most of its time.

answered Aug 9, 2017 at 6:29
\$\endgroup\$
3
  • \$\begingroup\$ Thank you for your response! I split the function into two blocks because I figured checking for an re.match in a for-loop's a real waste of time if no regex was passed in the first place. Can you recommend another way of approaching that? \$\endgroup\$ Commented Aug 9, 2017 at 6:47
  • \$\begingroup\$ Profile it first, then optimize for regex is None or whatever. I'd expect sys calls for os.walk() to dominate the usermode CPU time that checks a regex. Anyway, consider refactoring the common tree walking code into a generator function that repeatedly yields file entries. \$\endgroup\$ Commented Aug 9, 2017 at 20:20
  • \$\begingroup\$ In dedented you strip TABs. If you're looking to save a few CPU cycles, perhaps you'd prefer not to insert TABs in the first place. Also, replace() is more aggressive than re.compile(r'^ +') or lstrip() -- for example a filename might be "READ[4 spaces]ME.txt". \$\endgroup\$ Commented Aug 9, 2017 at 21:04

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.