I wrote the script below as my first python program that will be running in a production environment. Essentially this program looks at a source location and moves all the files to a destination location if the file is older than 14 days old and does not exists in the destination. I had a user on stack overflow suggest some variable name changes and I am in the process of doing that. I would love if someone could review this code and let me know if anyone sees any potential bugs or errors that I might run into.
import os
import shutil
import time
import errno
import time
import sys
import logging
import logging.config
source = r'C:\Users\Desktop\BetaSource'
dest = r'C:\Users\Desktop\BetaDest'
#Gets the current time from the time module
now = time.time()
#Timer of when to purge files
cutoff = now - (14 * 86400)
source_list = []
all_sources = []
all_dest_dirty = []
logging.basicConfig(level = logging.INFO, filename = time.strftime("main-%Y-%m-%d.log"))
def main():
dest_files()
purge_files()
#I used the dess_files function to get all of the destination files
def dest_files():
for dest_root, dest_subdirs, dest_files in os.walk(dest):
for f in dest_files:
global All_dest_dirty
all_dest_dirty.append(f)
def purge_files():
logging.info('invoke purge_files method')
#I removed all duplicates from dest because cleaning up duplicates in dest is out of the scope
all_dest_clean = list(dict.fromkeys(all_dest_dirty))
#os.walk used to get all files in the source location
for source_root, source_subdirs, source_files in os.walk(source):
#looped through every file in source_files
for f in source_files:
#appending all_sources to get the application name from the file path
all_sources.append(os.path.abspath(f).split('\\')[-1])
#looping through each element of all_source
for i in all_sources:
#logical check to see if file in the source folder exists in the destination folder
if i not in all_dest_clean:
#src is used to get the path of the source file this will be needed to move the file in shutil.move
src = os.path.abspath(os.path.join(source_root, i))
#the two variables used below are to get the creation time of the files
t = os.stat(src)
c = t.st_ctime
#logical check to see if the file is older than the cutoff
if c<cutoff:
logging.info(f'File has been succesfully moved: {i}')
print(f'File has been succesfully moved: {i}')
shutil.move(src,dest)
#removing the allready checked source files for the list this is also used in other spots within the loop
all_sources.remove(i)
else:
logging.info(f'File is not older than 14 days: {i}')
print(f'File is not older than 14 days: {i}')
all_sources.remove(i)
else:
all_sources.remove(i)
logging.info(f'File: {i} allready exists in the destination')
print(f'File: {i} allready exists in the destination')
if __name__ == '__main__':
main()
1 Answer 1
This is just a brain dump of some things about your code that I hope you find useful.
Automated tools
Automated tools can help you maintain your code more easily.
The first ones I install whenever I start a new Python project, are:
- black — The uncompromising Python code formatter
- isort — A Python utility to sort imports
- flake8 — A tool to enforce PEP 8 compliance (among other things)
I ran them on your code with the following options:
pipenv run black clean_old_files.py
pipenv run flake8 --max-line-length 88 clean_old_files.py
pipenv run isort clean_old_files.py -m=3 -tc -l=88
(An aside, I'm using Pipenv to manage my virtual environments. They're unmissable once you start getting into larger projects with external dependencies, and other contributors.)
Noisy comments
You have several comments that explain what the code does, rather than why.
For instance this:
# Gets the current time from the time module
now = time.time()
# Timer of when to purge files
cutoff = now - (14 * 86400)
which could be rewritten as:
# Timer of when to purge files
cutoff = time.time() - (14 * 86400)
which removes one "what" comment, and inlines a temporary variable.
My advice would be to go over all the comments in your script, and see which are "what" comments, and which are "why" comments.
Code structure
A minor thing to start off, I'd move the main
function to the bottom of the script, right before if __name__ == "__main__"
, which, conceptually makes most sense to me.
You are mutating global variables from functions, which makes it harder to follow your program structure as-is.
On a related note, you also have global All_dest_dirty
, which is a typo.
Globals
Instead of, for instance:
all_dest_dirty = []
def dest_files():
for dest_root, dest_subdirs, dest_files in os.walk(dest):
for f in dest_files:
global all_dest_dirty
all_dest_dirty.append(f)
prefer:
def dest_files():
dirty_files = []
for dest_root, dest_subdirs, dest_files in os.walk(dest):
for f in dest_files:
dirty_files.append(f)
return dirty_files
all_dest_dirty = dest_files()
Closing remarks
Correct me if I'm wrong, but from
all_dest_clean = list(dict.fromkeys(all_dest_dirty))
I think that you're trying to remove duplicates from a list?
You can write that as
all_dest_clean = set(all_dest_dirty)
which uses the set
data type, which prevents any duplicates. Plus, it's less typing, and also clearer. :-)
I hope you found my mini review helpful. You're doing great for a beginner! Keep it up. Also, let me know if anything is unclear.
-
\$\begingroup\$ That was a great answer! I love your feedback. I’ll definitely go through and create those functions to return values rather than manipulating global variables. Also id I turned all_dest_clean into a set would I still be able to check if I is not in all_dest_clean? Im not super familiar what the difference between using a list compared to a set. \$\endgroup\$TheMuellTrain– TheMuellTrain2019年06月02日 18:35:14 +00:00Commented Jun 2, 2019 at 18:35
-
1\$\begingroup\$ @TheMuellTrain Thank you! Re:
would I still be able to check if I is not in all_dest_clean
Yup! That's still possible. \$\endgroup\$grooveplex– grooveplex2019年06月02日 20:20:33 +00:00Commented Jun 2, 2019 at 20:20 -
\$\begingroup\$ I have modified the
dest_files()
function toreturn dirty_files
. My question now is what is the best practice for definingall_dest_dirty = dest_files()
I can not define it at the top of my program with the rest of the variables. Is it common to just define this right after the function? \$\endgroup\$TheMuellTrain– TheMuellTrain2019年06月03日 15:24:39 +00:00Commented Jun 3, 2019 at 15:24
Explore related questions
See similar questions with these tags.