3
\$\begingroup\$

Coming from another language than Python, I would like to see if my code is "pythonic" enough and follows good practices.

It compares two directories, showing all files that are in one and not in the other. Moreover, you can add up to two options:

  • a to also compare the content of the files
  • s to save the results into a file

Note: this is in Python 2.7, and I don't have access to argparse.

import filecmp
import os.path
import sys
def print_help():
 print "Use : python cmp_dir.py <dir_1> <dir_2> [-a] [-s <file>]"
 print "dir_1 and dir_2 : directories to compare"
 print "-a : also compare data inside files"
 print "-s : save results inside <file>"
 sys.exit()
 
def show_diff_in_dir(dir_name, list_diff, file_to_save_results):
 msg = "files or subdirs only in %s : %s " % (dir_name, list_diff)
 if file_to_save_results:
 file_to_save_results.write(msg + "\n")
 else:
 print msg
 
def show_diff_in_files(dir_name, diff_files, file_to_save_results):
 msg = "different files in %s : %s " % (dir_name, diff_files)
 if file_to_save_results:
 file_to_save_results.write(msg + "\n")
 else:
 print msg
 
def compare_dir_trees(dir1, dir2, compare_file_data, file_to_save_results):
 dirs_cmp = filecmp.dircmp(dir1, dir2)
 
 if dirs_cmp.left_only:
 show_diff_in_dir(dir1, dirs_cmp.left_only, file_to_save_results)
 if dirs_cmp.right_only:
 show_diff_in_dir(dir2, dirs_cmp.right_only, file_to_save_results)
 if compare_file_data and dirs_cmp.diff_files:
 show_diff_in_files(dir1, dirs_cmp.diff_files, file_to_save_results)
 
 for common_dir in dirs_cmp.common_dir:
 new_dir1 = os.path.join(dir1, common_dir)
 new_dir2 = os.path.join(dir2, common_dir)
 compare_dir_trees(new_dir1, new_dir2, compare_file_data, file_to_save_results)
 
if __name__ == "__main__":
 compare_file_data = False
 file_to_save_results = None
 
 if len(sys.argv) < 3:
 print_help()
 else:
 dir_a = sys.argv[1]
 dir_b = sys.argv[2]
 
 if len(sys.argv) == 4:
 if sys.argv[3] == "-a":
 compare_file_data = True
 else:
 print_help()
 elif len(sys.argv) == 5:
 if sys.argv[3] == "-s":
 file_to_save_results = open(sys.argv[4], "w")
 else:
 print_help()
 elif len(sys.argv) == 6:
 if sys.argv[3] == "-a" and sys.argv[4] == "-s":
 compare_file_data = True
 file_to_save_results = open(sys.argv[5], "w")
 elif sys.argv[5] == "-a" and sys.argv[3] == "-s":
 compare_file_data = True
 file_to_save_results = open(sys.argv[4], "w")
 else:
 print_help()
 elif len(sys.argv) > 6:
 print_help()
 
 print "Compare dirs %s and %s" % (dir_a, dir_b)
 print "Start compare"
 
 compare_dir_trees(dir_a, dir_b, compare_file_data, file_to_save_results)
 
 if file_to_save_results:
 file_to_save_results.close()
 
 print "End compare"
Mast
13.8k12 gold badges57 silver badges127 bronze badges
asked Jan 7, 2022 at 13:07
\$\endgroup\$
1
  • 1
    \$\begingroup\$ Why Python 2? There are vanishingly few circumstances where this is unavoidable. \$\endgroup\$ Commented Jan 7, 2022 at 16:16

4 Answers 4

4
\$\begingroup\$

Code doesn't work

for common_dir in dirs_cmp.common_dir: fails, because dirs_cmp does not have a common_dir.

The correct attribute is common_dirs. I'm going to assume this was some sort of copy-paste error (but that would be very odd since it was in the middle of the code!).

Python 2.7 -vs- Python 3.x

Python 2.7 is "obsolete". You should move to Python 3.x instead. However, there may be times when you must use Python 2.7. In these cases, it is much better to write code which will run equally well in both interpreters. __future__ is your friend here.

In Python 2.7, you can add at the top of your script:

from __future__ import print_function

and the Python 2.7 interpreter will then allow you to use Python 3.x style print() statements.

The Python 3.x interpreter will ignore this import, because the future is now.

Don't exit

Avoid calling sys.exit() at all costs. It unconditionally terminates your Python interpreter.

Why is that a problem? If you ever want to call print_help() and then do more things, you can't. For example, unit testing frameworks will call a function and compare the expected results with the actual results. If that unit testing framework is written in Python, the unit testing framework abruptly stops.

A simple way around this is to move your mainline code into its own function. After calling print_help(), follow that with return and the main() function will exit to its caller (such as the mainline code); the Python interpreter will not be terminated until the mainline code reaches the natural end of its execution.

def main(argv):
 ...
 if len(argv) < 3:
 print_help()
 return
 ...
if __name__ == '__main__':
 main(sys.argv)

As a benefit of the return, the rest of the code in the function doesn't need to be inside an else: block, making your code more "left leaning".

Working with resources

You've got 3 open() statements and one close(). If an exception occurs at any point in the code, the close() will not happen, and a file-system resource may be leaked. It won't matter here, since you don't do any exception handling, and the file-system resource will be released by the Python interpreter, but it is better to get into good habits.

The with statement should be used to automatically close resources. The resources will be closed at the end of the with block, regardless of whether the block is exited by

  • reaching the end of the block,
  • executing an early return statement, or
  • an exception being raised.

Nested function blocks

This is an advanced topic, but it simplifies your code a lot.

You keep recursing into a function with the parameters compare_file_data, file_to_save_results passed, unchanged. The parameters aren't really part of the recursive function; rather, they are just "configuration" variables. Really, you want these parameters to be "global", but you've probably heard globals are bad to use.

Make them global, locally, with a nested function block.

def compare_dir_trees(dir1, dir2, compare_file_data, file_to_save_results):
 def show_diff_in_dir(dir_name, list_diff):
 ...
 def show_diff_in_files(dir_name, diff_files):
 ...
 def compare_dirs(dir1, dir2):
 ...
 compare_dirs(dir1, dir2)

Calling compare_dir_trees() will set the compare_file_data, file_to_save_results parameters for the entire execution of that function, which is composed of one statement: compare_dirs(dir1, dir2). That function has access to those parameters, as do the other show_diff_in_ functions.

Standard Output -vs- Output to File

sys.stdout is a file handle. You can write to it just like you can write to file_to_save_results. You just don't need to open or close it.

This means your show_diff_in_ functions can be simplied to just writing to the passed in file handle, and not worry about choosing to print or write. Actually, print(..., file=file_to_save_results) would be even simplier, as it takes care of the "\n" for you.

Argument Parsing

If you look for the optional arguments -a and -s <filename> in sys.argv, and remove them if found, you will simplify argument parsing. Specifically, you don't need to handle -a -s <filename> differently from -s <filename> -a!

Reworked Code

from __future__ import print_function
import filecmp
import os.path
import sys
def print_help():
 print("Use : python cmp_dir.py <dir_1> <dir_2> [-a] [-s <file>]")
 print("dir_1 and dir_2 : directories to compare")
 print("-a : also compare data inside files")
 print("-s : save results inside <file>")
def compare_dir_trees(dir1, dir2, compare_file_data, output):
 def compare_dirs(dir1, dir2):
 dirs_cmp = filecmp.dircmp(dir1, dir2)
 
 if dirs_cmp.left_only:
 print("files or subdirs only in %s : %s " % (dir1, dirs_cmp.left_only),
 file=output)
 if dirs_cmp.right_only:
 print("files or subdirs only in %s : %s " % (dir2, dirs_cmp.right_only),
 file=output)
 if compare_file_data and dirs_cmp.diff_files:
 print("different files in %s : %s " % (dir1, dirs_cmp.diff_files),
 file=output)
 
 for common_dir in dirs_cmp.common_dirs:
 new_dir1 = os.path.join(dir1, common_dir)
 new_dir2 = os.path.join(dir2, common_dir)
 compare_dir(new_dir1, new_dir2)
 compare_dirs(dir1, dir2)
def main(argv):
 compare_file_data = False
 file_to_save_results = None
 if '-s' in argv:
 pos = argv.index('-s')
 if pos + 1 < len(argv):
 file_to_save_results = argv[pos + 1]
 del argv[pos:pos+2]
 if '-a' in argv:
 argv.remove('-a')
 compare_file_data = True
 if len(argv) != 3:
 print_help()
 return
 
 dir_a = argv[1]
 dir_b = argv[2]
 
 print("Compare dirs %s and %s" % (dir_a, dir_b))
 print("Start compare")
 if file_to_save_results:
 with open(file_to_save_results, "w") as file:
 compare_dir_trees(dir_a, dir_b, compare_file_data, file)
 else:
 compare_dir_trees(dir_a, dir_b, compare_file_data, sys.stdout)
 print("End compare")
 
if __name__ == '__main__':
 main(sys.argv)

This works with both Python 2.7 and Python 3.x

answered Jan 7, 2022 at 20:52
\$\endgroup\$
3
\$\begingroup\$

First of all, you really should stop using python 2, since it was sunset about 2 years ago. Outside of that, I'll try to keep my feedback somewhat balanced in regards to use of python 2, and I'll focus on what you request - ensuring that the code is "pythonic".


The way you're parsing arguments is clearly not pythonic. You say you don't have access to argparse but I don't understand why you wouldn't.

The way you're storing default arguments is also a bit odd; instead of defining

 compare_file_data = False
 file_to_save_results = None

under an if __name__ == "__main__": you could just as well set the default in the functions themselves.

Finally, I don't like how you rely on side-effects in your functions. I think it's better that you modify strings or whatever other data structure you decide to use instead of having conditional print statements. If nothing else it would greatly help you with debugging.

answered Jan 7, 2022 at 13:52
\$\endgroup\$
2
  • \$\begingroup\$ First of all, thank you for the answer. I don't understand how storing the default parameters in the function would help, since they still have to be retrieved from main, right? Finally I don't understand why the code relies on a side-effect either? The message to be displayed is the same, either in the console or in a file (but maybe that just means that my variable names are poorly chosen and unclear) \$\endgroup\$ Commented Jan 7, 2022 at 14:03
  • 1
    \$\begingroup\$ For the default parameters, you'd generally do def show_diff_in_dir(dir_name, list_diff, file_to_save_results=False): and then you'd call on this with show_diff_in_dir(dir_name, list_diff, file_to_save_results=True) when necessary. As for the side-effects, I mean it's poor practices to print in your functions instead of just returning a string and then dealing with the printing in your compare function; show_diff_in_files would then be diff_in_files. There's nothing hugely wrong with your variable names. \$\endgroup\$ Commented Jan 7, 2022 at 14:08
2
\$\begingroup\$

Don't use Python 2. You've tagged this beginner: beginners should not be learning deprecated technology.

Partially echo the feedback from @ades. Use argparse; use f-string interpolation instead of old-school percent-formatting. Don't have conditions on your file_to_save_results; instead unconditionally write to a file-like that defaults to stdout.

I find the direct list printing to be a little unsightly; consider using join instead.

Use pathlib instead of os.path.

I don't find the Start compare and End compare banners to be helpful and I think they can be dropped.

Add type hints.

Suggested

from argparse import ArgumentParser, Namespace
from filecmp import dircmp
from functools import partial
from pathlib import Path
from sys import stdout
from typing import TextIO
def parse_args() -> Namespace:
 parser = ArgumentParser(description='compare directories')
 parser.add_argument('dir_1', type=Path, help='first directory to compare')
 parser.add_argument('dir_2', type=Path, help='second directory to compare')
 parser.add_argument('-a', '--compare-file-data', action='store_true', help='also compare data inside files')
 parser.add_argument('-s', '--save-to', type=partial(open, mode='w'),
 default=stdout, help='safe results inside SAVE_TO')
 return parser.parse_args()
def show_diff_in_dir(dir_name: Path, list_diff: list[str], save_to: TextIO) -> None:
 if list_diff:
 msg = f'files or subdirs only in {dir_name}: {", ".join(list_diff)}'
 print(msg, file=save_to)
def show_diff_in_files(dir_name: Path, diff_files: list[str], save_to: TextIO) -> None:
 print(f'different files in {dir_name}: {", ".join(diff_files)}', file=save_to)
def compare_dir_trees(dir_1: Path, dir_2: Path, compare_file_data: bool, save_to: TextIO) -> None:
 dirs_cmp = dircmp(dir_1, dir_2)
 show_diff_in_dir(dir_1, dirs_cmp.left_only, save_to)
 show_diff_in_dir(dir_2, dirs_cmp.right_only, save_to)
 if compare_file_data and dirs_cmp.diff_files:
 show_diff_in_files(dir_1, dirs_cmp.diff_files, save_to)
 for common_dir in dirs_cmp.common_dirs:
 compare_dir_trees(dir_1 / common_dir, dir_2 / common_dir, compare_file_data, save_to)
def main() -> None:
 args = parse_args()
 print(f'Compare dirs {args.dir_1} and {args.dir_2}')
 compare_dir_trees(**args.__dict__)
if __name__ == "__main__":
 main()
answered Jan 7, 2022 at 20:17
\$\endgroup\$
1
\$\begingroup\$

I'll try not to repeat existing answers. Let's first note two things:

  • When someone (you included) reads the if statements at the top of compare_dir_trees, they might not be sure whether some of them were meant to be elifs. An important part of clean code is to make what's right obviously right.
  • Apart from the value of msg, show_diff_in_dir and show_diff_in_files currently do the same thing.

Let's refactor the aforementioned functions to reduce repetition and the number of explicit if statements in compare_dir_trees. I start by introducing a helper function:

def show_diff(dir_name, prefix, last_msg_arg, file_to_save_results):
 if not last_msg_arg: return
 msg = "%s %s: %s" % (prefix, dir_name, last_msg_arg)
 # Some answers advised changing behaviour below,
 # so calls to show_diff return its result, which is currently None
 if file_to_save_results:
 file_to_save_results.write(msg + "\n")
 else:
 print msg
def show_diff_in_dir(dir_name, list_diff, file_to_save_results):
 return show_diff(dir_name, "files or subdirs only in", list_diff, file_to_save_results)
def show_diff_in_files(dir_name, diff_files, file_to_save_results):
 return show_diff(dir_name, "different files in", diff_files, file_to_save_results)
def compare_dir_trees(dir1, dir2, compare_file_data, file_to_save_results):
 dirs_cmp = filecmp.dircmp(dir1, dir2)
 show_diff_in_dir(dir1, dirs_cmp.left_only, file_to_save_results)
 show_diff_in_dir(dir2, dirs_cmp.right_only, file_to_save_results)
 if compare_file_data:
 show_diff_in_files(dir1, dirs_cmp.diff_files, file_to_save_results)
 # Rest of function

The first comment above (which you shouldn't keep!) explains how two returns make the code I've suggested more maintainable.

Several answers have advised you not to use Python 2.7. I realize you might be stuck with it, but if you one day can use at least 3.4 I can make another suggestion: replace os.path.join(dir1, common_dir) with pathlib.Path(dir1)/common_dir, and similarly with dir2. This allows a nice improvement to the for loop in compare_dir_trees.

answered Jan 8, 2022 at 8:14
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.