Compare two directories, apply the changes, backup and restore

Question 1

I’ve written a small utility tool capable of

backing up a directory,
restoring a backed up directory,
syncing two directories (copying missing or changed files from src to dst).

I’d love your feedback - any code smells, improvements, potential bugs, non-idiomatic approaches, etc., please let me know.

Here's how I backup and revert:

SHA1_DIGEST_REGEX = "[a-fA-F0-9]{40}"
def backup(directory, repo=REPO, backup_file=BACKUP):
 if not os.path.isdir(repo):
 os.makedirs(repo)
 key = str(time.time())
 sha = hashlib.sha1(key).hexdigest()
 archive_path = os.path.join(repo, sha)
 assert not os.path.isfile(archive_path)
 shutil.make_archive(archive_path, "zip", directory)
 backup_path = os.path.join(directory, backup_file)
 with open(backup_path, 'w') as b:
 b.write(sha)
def revert(directory, repo=REPO, backup_file=BACKUP):
 try:
 with open(os.path.join(directory, backup_file)) as b:
 sha = b.readline()
 if re.compile(SHA1_DIGEST_REGEX).match(sha):
 backup_path = os.path.join(repo, sha + ".zip")
 if not os.path.isfile(backup_path):
 raise IOError
 if os.listdir(directory):
 shutil.rmtree(directory)
 with zipfile.ZipFile(backup_path) as zf:
 zf.extractall(directory)
 else:
 print "Backup file corrupted."
 except IOError:
 print "Either no history is available or the backup files are missing."

Comparing two directories and applying the changes:

def find_diff(src, dst):
 if os.path.isdir(dst):
 diff = filecmp.dircmp(src, dst)
 return diff.left_only + diff.diff_files
 else:
 return [f for f in os.listdir(src)]
def apply_diff(src, dst, diff_list=None, auto_backup=True):
 if diff_list is None:
 diff_list = find_diff(src, dst)
 if diff_list:
 if auto_backup:
 backup(dst)
 for item in diff_list:
 src_item = os.path.join(src, item)
 dst_item = os.path.join(dst, item)
 if os.path.isdir(src_item):
 shutil.copytree(src_item, dst_item)
 else:
 shutil.copy(src_item, dst_item)

Sync (read multiple src dirs from a file):

def sync(directory, sync_file=SYNC):
 try:
 with open(os.path.join(directory, sync_file)) as s:
 src_list = s.read().splitlines()
 if src_list:
 for src in src_list:
 if not os.path.isdir(src):
 raise IOError
 backup(directory)
 for src in src_list:
 apply_diff(src, directory, auto_backup=False)
 else:
 print "Sync file is empty."
 except IOError:
 print "Sync file is either missing or contains invalid sources."

I hope the code speaks for itself, but I'm happy to clarify things if they don't make sense.

Question 2

This question is 10 years old. I'll attempt to review it, but a lot has happened since. For example, where in 2015 it was (strongly) discouraged to stick to Python 2 any new project starting today should absolutely avoid it. And that's coming from someone who got stuck to 2 much longer than he liked. In my experience, any library that hasn't been ported from 2 to 3 in the meantime has an alternative that's superior most of the time. Your print statements are a dead giveaway.

I'm missing a lot in this question. Imports are not listed. Some of the pseudo-globals are missing. It's unclear whether the functions share a file or are split up. Are they called by various other (automated) scripts or by a utility program ran by the user manually? I'm assuming the reverts aren't automated, but the backups might be. Lacking this information, I understand why people have been hesitant to review an otherwise fairly straight-forward piece of code. It does mean that I am not going to fully test your code (I've been burned by trying to 'guess' what the rest looks like before) and will only provide some general remarks.

It looks as if your backup is placed in a folder that's part of the repo. That's unfortunate to say the least, since each future backup would contain the old backup(s). Which is going to ruin your diffs. If so, this would be a big problem for me with this code. The backups should be stored in a separate location. Preferably a dedicated folder that only contains these backups. Timestamped with at least the date (yyyy-mm-dd) in the zip files if you ever want more than one backup of the same repo.

Then some lesser remarks.

Instead of:

if re.compile(SHA1_DIGEST_REGEX).match(sha):

you can store the compiled regex straight in the SHA1_DIGEST_REGEX variable, turning your code into the much cleaner

if SHA1_DIGEST_REGEX.match(sha):

instead. Arguably the _REGEX post-fix is not required.

If you are using this program only for yourself, you may not want to hide any bugs you didn't account for with your except IOError: only printing a standard message. Print the entire error with it:

except IOError as e:
 print("Sync file is either missing or contains invalid sources.")
 print(e)

Note that in Python 3 IOError is now simply an alias of OSError and will now catch both the 'old' IOError and OSError. Which could bite you when upgrading your code to Python 3 since with your code you'll now hide bugs you wouldn't have hidden before.

If you aren't using this program just by yourself but give it out to more people, you need a better description of what your functions do and what's expected of the people using it in regards to how data is provided to your program for it to work. At that time it becomes more important to look into subclasses when catching errors, such as FileNotFoundError and PermissionError. Find the list of subclasses here.

Naturally, in modern Python code the whole thing would at least get a docstring per function regardless of who uses it, see PEP-257, and something vaguely resembling a docstring at the top of the main document (especially if you don't have a README).

Mast ♦Mast 13.8k12 gold badges57 silver badges127 bronze badges · Answer 1 · 2025-07-30 08:58:17Z

This question is 10 years old. I'll attempt to review it, but a lot has happened since. For example, where in 2015 it was (strongly) discouraged to stick to Python 2 any new project starting today should absolutely avoid it. And that's coming from someone who got stuck to 2 much longer than he liked. In my experience, any library that hasn't been ported from 2 to 3 in the meantime has an alternative that's superior most of the time. Your print statements are a dead giveaway.

I'm missing a lot in this question. Imports are not listed. Some of the pseudo-globals are missing. It's unclear whether the functions share a file or are split up. Are they called by various other (automated) scripts or by a utility program ran by the user manually? I'm assuming the reverts aren't automated, but the backups might be. Lacking this information, I understand why people have been hesitant to review an otherwise fairly straight-forward piece of code. It does mean that I am not going to fully test your code (I've been burned by trying to 'guess' what the rest looks like before) and will only provide some general remarks.

It looks as if your backup is placed in a folder that's part of the repo. That's unfortunate to say the least, since each future backup would contain the old backup(s). Which is going to ruin your diffs. If so, this would be a big problem for me with this code. The backups should be stored in a separate location. Preferably a dedicated folder that only contains these backups. Timestamped with at least the date (yyyy-mm-dd) in the zip files if you ever want more than one backup of the same repo.

Then some lesser remarks.

Instead of:

if re.compile(SHA1_DIGEST_REGEX).match(sha):

you can store the compiled regex straight in the SHA1_DIGEST_REGEX variable, turning your code into the much cleaner

if SHA1_DIGEST_REGEX.match(sha):

instead. Arguably the _REGEX post-fix is not required.

If you are using this program only for yourself, you may not want to hide any bugs you didn't account for with your except IOError: only printing a standard message. Print the entire error with it:

except IOError as e:
 print("Sync file is either missing or contains invalid sources.")
 print(e)

Note that in Python 3 IOError is now simply an alias of OSError and will now catch both the 'old' IOError and OSError. Which could bite you when upgrading your code to Python 3 since with your code you'll now hide bugs you wouldn't have hidden before.

If you aren't using this program just by yourself but give it out to more people, you need a better description of what your functions do and what's expected of the people using it in regards to how data is provided to your program for it to work. At that time it becomes more important to look into subclasses when catching errors, such as FileNotFoundError and PermissionError. Find the list of subclasses here.

Naturally, in modern Python code the whole thing would at least get a docstring per function regardless of who uses it, see PEP-257, and something vaguely resembling a docstring at the top of the main document (especially if you don't have a README).

Stack Exchange Network

Compare two directories, apply the changes, backup and restore

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Compare two directories, apply the changes, backup and restore

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions