I wrote this piece of code to mimic the *nix tree utility, however, I am not happy with the pad_info=[]
part. It's used to make padding according to whether the parents were last child of their parents or not. Is there any alternate (and a more elegant) way to do this?
def tree(path, depth=1, max_depth=100, print_hidden=False, pad_info=[]):
'''Print contents of directories in a tree-like format
By default, it prints upto a depth of 100 and doesn't print hidden files,
ie, files whose name begin with a '.'
returns number of files, number of directories it encountered
'''
fileCount, dirCount = 0, 0
files = sorted(os.listdir(path), key=lambda s: s.lower())
if not print_hidden:
files = [os.path.join(path, x) for x in files if not x.startswith('.')]
for i, file in enumerate(files):
padding = ['| ' if x == 'nl' else ' ' for x in pad_info]
padding = ''.join(padding)
filename = os.path.basename(file)
is_last = i == len(files) - 1
prefix = '`-- ' if is_last else '|-- '
print '%s%s%s' % (padding, prefix, filename)
if os.path.isdir(file):
dirCount += 1
new_pad_info = pad_info + (['l'] if is_last else ['nl'])
fc, dc = tree(os.path.join(path, file), depth=depth+1,
max_depth=max_depth, print_hidden=print_hidden,
pad_info=new_pad_info)
fileCount += fc
dirCount += dc
else:
fileCount += 1
return fileCount, dirCount
1 Answer 1
Some minor style and code comments:
- Be consistent in using
snake_case
for variables names – Most names are good, but then afileCount
anddirCount
sneaks in. Also try avoid using abbreviations likefc
anddc
- Avoid building lists twice – You first build the list of
files
fromos.listdir()
, and then rebuild it usingfiles
in combination withprint_hidden
. This can be avoided using a list comprehension with an if statement like[a for a in a_list if a == something]
- Avoid one-time intermediate variables – If you use a variable just once, it can be argued that it is better to just use the expression directly. In your code this applies to
padding
,filename
andprefix
- Avoid hiding predefined variables and functions – Don't use reservered words like
file
in your code, which hides the originalfile
. - When supplying all variables, you don't need to prefix them by name – In your recursive call to
tree
you prefix them, which is not needed as the order and presences is already stated. - Instead of storing text, store a boolean in your
pad_info
– To avoid string comparisons and storage, it makes more sense to store a boolean value whether the director is the last in the current directory or not. This can further simplify your code by using a preset list to choose which text to be displayed directly asTrue = 1
andFalse = 0
when used in a list context. - Maybe a personal preferences, but use
"""
for docstrings – If you are consistently using"""
for docstrings, this allows for intermediate'
in the text, as well as enabling you to comment out larger portion of code securely using the other option of'''
. - Add option to remove output – As your code returns the file and directory counts, it could be viable to foresee a situation where you are not interested in actually viewing the output. This could be added as another option to your function.
Use internal function to hide inner implementation details - It could be argued that this is a good candidate for using a function within the function. This internal function could be the one actually handling the recursion, and only having the parameters
path
anddepth
.In other words, you could hide some of the inner workings of your function using a inner function, and let the outer function be a simple call to the inner function doing the work for you. Notice also that the inner function has full access to the outer functions variables, so need to pass them around (or treat as globals).
BUG: Your code doesnt' respect the
max_depth
variable – You don't have anything bailing out if the depth is too high...- Feature: No marking of empty directories – If reducing
max_depth
or having empty directories there is no marking to indicate that the current entry is actually a directory.
My refactored code
Here is the code when applying most of this comment to it:
import os
FOLDER_PATTERN = ['| ', ' ']
FILE_PATTERN = ['|-- ', '`-- ']
def tree(path, do_output=True, print_hidden=False, max_depth=100):
"""Print file and directory tree starting at path.
By default, it prints upto a depth of 100 and doesn't print hidden files,
ie. files whose name begin with a '.'. It can be modified to only return
the count of files and directories, and not print anything.
Returns the tuple of number of files and number of directories
"""
def _tree(path, depth):
file_count, directory_count = 0, 0
files = sorted((os.path.join(path, filename)
for filename in os.listdir(path)
if print_hidden or not filename.startswith('.')),
key=lambda s: s.lower())
files_count = len(files)
for i, filepath in enumerate(files, start = 1):
# Print current file, based on previously gathered info
if do_output:
print('{}{}{}'.format(
''.join(FOLDER_PATTERN[folder] for folder in parent_folders),
FILE_PATTERN[i == files_count],
os.path.basename(filepath)))
# Recurse if we find a new subdirectory
if os.path.isdir(filepath) and depth < max_depth:
# Append whether current directory is last in current list or not
parent_folders.append(i == files_count)
# Print subdirectory and get numbers
subdir_file_count, subdir_directory_count = \
_tree(os.path.join(filepath), depth+1)
# Back in current directory, remove the newly added directory
parent_folders.pop()
# Update counters
file_count += subdir_file_count
directory_count += subdir_directory_count + 1
elif os.path.isdir(filepath):
directory_count += 1
else:
file_count += 1
return file_count, directory_count
parent_folders = []
return _tree(path, 1)
-
\$\begingroup\$ I don't know how I missed this years ago. Thanks for the awesome answer :) \$\endgroup\$Sourabh– Sourabh2018年07月16日 10:36:18 +00:00Commented Jul 16, 2018 at 10:36