I’ve written a small utility tool capable of
- backing up a directory,
- restoring a backed up directory,
- syncing two directories (copying missing or changed files from
src
todst
).
I’d love your feedback - any code smells, improvements, potential bugs, non-idiomatic approaches, etc., please let me know.
Here's how I backup and revert:
SHA1_DIGEST_REGEX = "[a-fA-F0-9]{40}"
def backup(directory, repo=REPO, backup_file=BACKUP):
if not os.path.isdir(repo):
os.makedirs(repo)
key = str(time.time())
sha = hashlib.sha1(key).hexdigest()
archive_path = os.path.join(repo, sha)
assert not os.path.isfile(archive_path)
shutil.make_archive(archive_path, "zip", directory)
backup_path = os.path.join(directory, backup_file)
with open(backup_path, 'w') as b:
b.write(sha)
def revert(directory, repo=REPO, backup_file=BACKUP):
try:
with open(os.path.join(directory, backup_file)) as b:
sha = b.readline()
if re.compile(SHA1_DIGEST_REGEX).match(sha):
backup_path = os.path.join(repo, sha + ".zip")
if not os.path.isfile(backup_path):
raise IOError
if os.listdir(directory):
shutil.rmtree(directory)
with zipfile.ZipFile(backup_path) as zf:
zf.extractall(directory)
else:
print "Backup file corrupted."
except IOError:
print "Either no history is available or the backup files are missing."
Comparing two directories and applying the changes:
def find_diff(src, dst):
if os.path.isdir(dst):
diff = filecmp.dircmp(src, dst)
return diff.left_only + diff.diff_files
else:
return [f for f in os.listdir(src)]
def apply_diff(src, dst, diff_list=None, auto_backup=True):
if diff_list is None:
diff_list = find_diff(src, dst)
if diff_list:
if auto_backup:
backup(dst)
for item in diff_list:
src_item = os.path.join(src, item)
dst_item = os.path.join(dst, item)
if os.path.isdir(src_item):
shutil.copytree(src_item, dst_item)
else:
shutil.copy(src_item, dst_item)
Sync (read multiple src dirs from a file):
def sync(directory, sync_file=SYNC):
try:
with open(os.path.join(directory, sync_file)) as s:
src_list = s.read().splitlines()
if src_list:
for src in src_list:
if not os.path.isdir(src):
raise IOError
backup(directory)
for src in src_list:
apply_diff(src, directory, auto_backup=False)
else:
print "Sync file is empty."
except IOError:
print "Sync file is either missing or contains invalid sources."
I hope the code speaks for itself, but I'm happy to clarify things if they don't make sense.
1 Answer 1
This question is 10 years old. I'll attempt to review it, but a lot has happened since. For example, where in 2015 it was (strongly) discouraged to stick to Python 2 any new project starting today should absolutely avoid it. And that's coming from someone who got stuck to 2 much longer than he liked. In my experience, any library that hasn't been ported from 2 to 3 in the meantime has an alternative that's superior most of the time. Your print
statements are a dead giveaway.
I'm missing a lot in this question. Imports are not listed. Some of the pseudo-globals are missing. It's unclear whether the functions share a file or are split up. Are they called by various other (automated) scripts or by a utility program ran by the user manually? I'm assuming the reverts aren't automated, but the backups might be. Lacking this information, I understand why people have been hesitant to review an otherwise fairly straight-forward piece of code. It does mean that I am not going to fully test your code (I've been burned by trying to 'guess' what the rest looks like before) and will only provide some general remarks.
It looks as if your backup is placed in a folder that's part of the repo. That's unfortunate to say the least, since each future backup would contain the old backup(s). Which is going to ruin your diffs. If so, this would be a big problem for me with this code. The backups should be stored in a separate location. Preferably a dedicated folder that only contains these backups. Timestamped with at least the date (yyyy-mm-dd) in the zip files if you ever want more than one backup of the same repo.
Then some lesser remarks.
Instead of:
if re.compile(SHA1_DIGEST_REGEX).match(sha):
you can store the compiled regex straight in the SHA1_DIGEST_REGEX
variable, turning your code into the much cleaner
if SHA1_DIGEST_REGEX.match(sha):
instead. Arguably the _REGEX
post-fix is not required.
If you are using this program only for yourself, you may not want to hide any bugs you didn't account for with your except IOError:
only printing a standard message. Print the entire error with it:
except IOError as e:
print("Sync file is either missing or contains invalid sources.")
print(e)
Note that in Python 3 IOError
is now simply an alias of OSError
and will now catch both the 'old' IOError
and OSError
. Which could bite you when upgrading your code to Python 3 since with your code you'll now hide bugs you wouldn't have hidden before.
If you aren't using this program just by yourself but give it out to more people, you need a better description of what your functions do and what's expected of the people using it in regards to how data is provided to your program for it to work. At that time it becomes more important to look into subclasses when catching errors, such as FileNotFoundError
and PermissionError
. Find the list of subclasses here.
Naturally, in modern Python code the whole thing would at least get a docstring per function regardless of who uses it, see PEP-257, and something vaguely resembling a docstring at the top of the main document (especially if you don't have a README).