This is a (relatively) cleaned-up version of a Python script I've been working on for the past day or two based on Tim Golden's first solution to the above problem. I know about watchdog but I'm trying to keep dependencies down to a minimum.
I'm concerned about how Pythonic my code is, as this is the first "real" script I've worked on. I'm also unsure of whether I have any redundancies (like all the if not ignored else None
s) or inefficiencies in my code.
Any comments on how I can properly refactor the script are appreciated.
ianus.py
import os
def ianus(path, interval=60, recursive=True, ignore_dirs=False, ignore_files=False):
import time
def path_join(root, dest):
return os.path.join(root, dest)
def path_replace(path):
return path.replace('\\', '/') + ['', '/'][os.path.isdir(path)]
def pathify(root, dest):
return path_replace(path_join(root, dest))
def mod_time(root, dest):
return os.stat(pathify(root, dest)).st_mtime
def build(path, recursive=True, ignore_dirs=False, ignore_files=False):
def flatten(list):
return [item for sublist in list for item in sublist]
if recursive:
walk = list(os.walk(path)) if recursive else None
rdirs = flatten(
[[pathify(root, dir) for dir in dirs] for (root, dirs) in \
[(root, dirs) for (root, dirs, files) in walk]]) \
if not ignore_dirs else None
rfiles = flatten(
[[pathify(root, f) for f in files] for (root, files) in \
[(root, files) for (root, dirs, files) in walk]]) \
if not ignore_files else None
else:
l = [pathify(path, u) for u in os.listdir(path)]
rdirs = [d for d in l if os.path.isdir(d)] \
if not ignore_dirs else None
rfiles = [f for f in l if os.path.isfile(f)] \
if not ignore_files else None
return rdirs, rfiles
path = path_replace(path)
print 'Watching ' + path + '...'
print '---'
dirs_before, files_before = build(path, recursive, ignore_files, ignore_dirs)
dir_times = [(d, mod_time(path, d)) for d in dirs_before] \
if not ignore_dirs else None
file_times = [(f, mod_time(path, f)) for f in files_before] \
if not ignore_files else None
while True:
time.sleep(interval)
dirs_after, files_after = build(path, recursive, ignore_dirs, ignore_files)
new_dir_times = [(d, mod_time(path, d)) for d in dirs_after] \
if not ignore_dirs else None
new_file_times = [(f, mod_time(path, f)) for f in files_after] \
if not ignore_files else None
msg = [None, None]
if not ignore_dirs:
dirs_added = [d for d in dirs_after if not d in dirs_before]
dirs_removed = [d for d in dirs_before if not d in dirs_after]
dirs_updated = [d[0] for d in new_dir_times if not \
((d in dir_times) or (d[0] in files_added))]
msg[0] = (dirs_added, dirs_removed, dirs_updated)
if not ignore_files:
files_added = [f for f in files_after if not f in files_before]
files_removed = [f for f in files_before if not f in files_after]
files_updated = [f[0] for f in new_file_times if not \
((f in file_times) or (f[0] in files_added))]
msg[1] = (files_added, files_removed, files_updated)
print msg
print '---'
dirs_before = dirs_after
files_before = files_after
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('-p', type=str,
help='Set the path to be watched.')
parser.add_argument('-intr', type=int,
help='Set the poll interval of the watching thread.')
parser.add_argument('-rec',
help='Checks all subdirectories for changes.')
parser.add_argument('--ignd',
help='Ignores directories. Leaves msg[0] as None.',
action='store_true')
parser.add_argument('--ignf',
help='Ignores files. Leaves msg[1] as None.',
action='store_true')
args = parser.parse_args()
path = os.getcwd()
interval = 60
recursive, ignore_dirs, ignore_files = False, False, False
if args.p:
if os.path.isdir(args.p):
path = args.p
else:
print 'Not a valid directory.'
sys.exit(1)
if args.intr:
if args.intr < 10:
print 'Too short an interval.'
sys.exit(1)
else:
interval = args.intr
if args.rec:
recursive = True
if args.ignd:
ignore_dirs = True
if args.ignf:
ignore_files = True
if(ignore_dirs and ignore_files):
print 'Both directories and files are ignored. Nothing is watched.'
sys.exit(1)
ianus(path, interval, recursive, ignore_dirs, ignore_files)
-
\$\begingroup\$ Readers may be interested in an alternative solutions to this problem, such as this one, and the others on the same page: superuser.com/a/970780/57697 \$\endgroup\$Jonathan Hartley– Jonathan Hartley2016年10月10日 18:14:15 +00:00Commented Oct 10, 2016 at 18:14
2 Answers 2
There are no docstrings. What do your functions do, what arguments do they take, and what do they return?
It's not clear to me what I could use your program for. The output is produced like this:
print msg print '---'
where
msg
is a complicated Python data structure (a list of tuples of lists of strings). There's no easy way for me to use this output in a shell script: it would take too much parsing.So what are the use cases for your program?
Your code does not run under Python 3.
You've defined a bunch of functions inside the function
ianus
. Presumably this is for reasons of "privacy" — the functions are only called in this context, so you have defined them there. However, in Python we generally don't define local functions like this unless we have a better reason than "privacy" (such as use of variables from the local scope). It would be simpler to define these functions at top level.path_join
is just a wrapper aroundos.path.join
and so seems hardly worth defining. But if you really insisted on having such an alias, you could create it like this:path_join = os.path.join
or like this:
from os.path import join as path_join
You call
pathify
all over the place but I doubt this is necessary. The paths mostly come fromos.walk
and so should be fine without this kind of manipulation.A function that does the same thing as your
flatten
is already in the standard library:itertools.chain.from_iterable
.You always test
not ignore_files
andnot ignore_dirs
. This suggests that these Booleans are the wrong way round and you should have named themwatch_files
andwatch_dirs
instead.You set
rfiles
toNone
ifignore_files
is set. This means you have to keep testingif not ignore_files
. But if you had setrfiles
to the empty list in that case, then you could skip the test in several places (because iterating over the empty list would yield no files).You don't need to use backslashes to continue lines inside brackets. So the backslash here is unnecessary:
dirs_updated = [d[0] for d in new_dir_times if not \ ((d in dir_times) or (d[0] in files_added))]
The parentheses here are unnecessary:
if(ignore_dirs and ignore_files):
(Python is not C.)
After collecting files and directories into separate lists
dirs_after
andfiles_after
, you treat them almost identically. The only difference is that changes to directories go intomsg[0]
and changes to files go intomsg[1]
. Is this distinction really necessary? If not, why not just keep one list containing all watched paths, and simplify the code accordingly.This code seems over-complicated, with double list comprehensions:
if recursive: walk = list(os.walk(path)) if recursive else None rdirs = flatten( [[pathify(root, dir) for dir in dirs] for (root, dirs) in \ [(root, dirs) for (root, dirs, files) in walk]]) \ if not ignore_dirs else None rfiles = flatten( [[pathify(root, f) for f in files] for (root, files) in \ [(root, files) for (root, dirs, files) in walk]]) \ if not ignore_files else None else: l = [pathify(path, u) for u in os.listdir(path)] rdirs = [d for d in l if os.path.isdir(d)] \ if not ignore_dirs else None rfiles = [f for f in l if os.path.isfile(f)] \ if not ignore_files else None
I would write it like this:
watched_paths = [] if recursive: for root, dirs, files in os.walk(path): if watch_dirs: watched_paths.extend(os.path.join(root, d) for d in dirs) if watch_files: watched_paths.extend(os.path.join(root, f) for f in files) else: for entry in os.listdir(path): entry = os.path.join(path, entry) if (watch_dirs and os.path.isdir(entry) or watch_files and os.path.isfile(entry)): watched_paths.append(entry)
You've spelled the arguments
-intr
and-rec
with a single hyphen. Generally you should reserve single hyphens for options with just one letter. So I would change these to-i
and-r
respectively (with long versions--interval
and--recursive
).In your argument parsing code, you should use the
default
keyword toArgumentParser.add_argument
to simplify the argument processing. For example, instead of:parser.add_argument('-rec', help='Checks all subdirectories for changes.') parser.add_argument('--ignd', help='Ignores directories. Leaves msg[0] as None.', action='store_true') parser.add_argument('--ignf', help='Ignores files. Leaves msg[1] as None.', action='store_true') # ... recursive, ignore_dirs, ignore_files = False, False, False # ... if args.rec: recursive = True if args.ignd: ignore_dirs = True if args.ignf: ignore_files = True
` you could write:
parser.add_argument('-rec', default=False, help='Checks all subdirectories for changes.', action='store_true') parser.add_argument('--ignd', default=False, help='Ignores directories. Leaves msg[0] as None.', action='store_true') parser.add_argument('--ignf', default=False, help='Ignores files. Leaves msg[1] as None.', action='store_true')
though as explained above I would recommend reversing the sense of the option variables and write something like this instead:
parser.add_argument('-r', '--recursive', default=False, help='Checks all subdirectories for changes.', action='store_true') parser.add_argument('--ignore-dirs', dest='watch_dirs', default=True, help='Ignore directories?', action='store_false') parser.add_argument('--ignore-files', dest='watch_files', default=True, help='Ignore files?', action='store_false')
-
\$\begingroup\$ Thank you! That was very comprehensive and helpful. If you wouldn't mind, I still have a few questions to address to you. \$\endgroup\$user3294504– user32945042014年02月12日 10:13:17 +00:00Commented Feb 12, 2014 at 10:13
I think it's generally good! There are a few little things that stand out to me as being repetitive and not quite Pythonic.
For instance the block at the end could be rewritten to avoid all the extra assignments. Unless you think they add readability. In which case I would suggest that the argument names themselves should be changed.
if(args.ignd and args.ignf):
print 'Both directories and files are ignored. Nothing is watched.'
sys.exit(1)
ianus(args.p, args.intr, args.rec, args.ignd, args.ignf)
By default, if an optional argument isn't given to argparse it is given the value of None. Which in Python is falsy. So in the block below you don't need the third line at all.
path = os.getcwd()
interval = 60
recursive, ignore_dirs, ignore_files = False, False, False
The other two lines could be made default arguments.
parser.add_argument('-p', type=str, default=os.getcwd(),
help='Set the path to be watched.')
parser.add_argument('-intr', type=int, default=60,
help='Set the poll interval of the watching thread.')
It's generally accepted in Python that imports go at the top of the module/script unless you have a really good reason for delaying the import. I can't see one in this case.
You have a helper function called path_join
. Which is fine. But it could just as easily have been written like so
path_join = os.path.join
No need to create a new function that calls an existing one with exactly the same signature. If you want to call it something else for readability just assign it to a variable.
One thing that most Python people attempt to do is write in a highly readable style. Generally avoiding shorter variable names and instead attempting to use full nouns. For instance I might write this block
l = [pathify(path, u) for u in os.listdir(path)]
As
contents = [pathify(path, child) for child in os.listdir(path)]
For the same reason most people agree that double negatives are harder to read/grok. So I might change my variable names and set the defaults to True rather than False, like below:
rfiles = [f for f in l if os.path.isfile(f)] \
if watch_files else None
-
\$\begingroup\$ Thanks! Concerning imports, I thought that you import what you need in a function so that when you need to import that function in another script, that library gets imported as well. Is that not the case? \$\endgroup\$user3294504– user32945042014年02月12日 10:14:39 +00:00Commented Feb 12, 2014 at 10:14
-
\$\begingroup\$ That is not the case. In Python there is no way to import a function from a module without loading the module itself. For the interpreter to even be aware of a function definition in a module it has to load it. \$\endgroup\$aychedee– aychedee2014年02月12日 11:35:45 +00:00Commented Feb 12, 2014 at 11:35