I've written a script that recursively goes through a directory tree and looks for folders containing folders with NOT INVOICED
in the folder name. If it finds one it prints the name of the containing folder and the names of any folders inside it with NOT INVOICED
in the folder name.
The output is:
FOLDER 1 NOT INVOICED FOLDER 1 NOT INVOICED FOLDER 2 FOLDER 3 NOT INVOICED FOLDER 1
The idea is that FOLDER 1
contains a few other folders that don't have NOT INVOICED
in their name, FOLDER 2
doesn't contain any folders with NOT INVOICED
in their name, and so on.
It works, but the only problem is this is what it looks like:
import os
import re
rootDir = '.'
with open('names.txt', 'w') as f:
for dirName, subdirList,fileList in os.walk(rootDir):
for fname in subdirList:
if re.match("NOT INVOICED", fname):
f.write(dirName + '\n')
for i in subdirList:
if re.match("NOT INVOICED", i):
f.write('\t%s\n' % i)
else:
pass
else:
pass
There's got to be a way to write this without six indentations.
3 Answers 3
It looks like your script's output will look more like this:
FOLDER 1
NOT INVOICED FOLDER 1
NOT INVOICED FOLDER 2
FOLDER 1
NOT INVOICED FOLDER 1
NOT INVOICED FOLDER 2
FOLDER 3
NOT INVOICED FOLDER 1
It happens because for each subdirectory which contains NOT INVOICED
you'll print containing directory name and all subdirectories. It can be easily fixed by adding an extra break
in the end of the first if
.
Now to the question. First of all, PEP 8 recommends using 4 spaces per one ident, neither tabs, nor 8 spaces. Also, you are free to omit the else
branch in your if
s - no need in pass
and extra lines.
I would separate checking of whether the directory should be printed (with some subdirectories) and the actual printing - that would kill inner for
loop. Moreover, it will also make it impossible to output a directory twice:
import os
import re
rootDir = '.'
with open('names.txt', 'w') as f:
for dirName, subdirList,fileList in os.walk(rootDir):
shouldPrint = False
for fname in subdirList:
if re.match("NOT INVOICED", fname):
shouldPrint = True
if shouldPrint:
f.write(dirName + '\n')
for i in subdirList:
if re.match("NOT INVOICED", i):
f.write('\t%s\n' % i)
Next things that I'd do are:
- Calculate not invoiced subdirectories just once with list-comprehension. That will remove redundant
for
+if
pair and also lower the risk of misspelling one of regexes. - Replace
fileList
variable with wildcard as it's unused - Run
pep8
tool (can be installed withpip install pep8
) on the code to make sure it corresponds with PEP 8. It'll report some whitespace issues in the very firstfor
loop. - Make sure that only only style of quotes is used across the code -
'
or"
.
And now we have:
import os
import re
rootDir = '.'
with open('names.txt', 'w') as f:
for dirName, subdirList, _ in os.walk(rootDir):
notInvoiced = [fname for fname
in subdirList
if re.match('NOT INVOICED', fname)]
if notInvoiced:
f.write(dirName + '\n')
for i in notInvoiced:
f.write('\t%s\n' % i)
Also, if you're Python 3 user, I'd recommend using str.format()
instead of %
operator - the latter is kind of deprecated.
-
\$\begingroup\$ Thanks. I didn't even realize that it was producing the wrong output when I ran it through the folders the first time. I was unfamiliar with the
for in if
structure. What is gained by replacing the variable with a wildcard? \$\endgroup\$Marc Adler– Marc Adler2016年05月14日 20:40:29 +00:00Commented May 14, 2016 at 20:40 -
\$\begingroup\$ @MarcAdler if variable has a name, I personally expect it to be used somewhere. If I don't see any usages, I get a little worried - is it really unused or it's just inattentive me? Matter of taste, I think. Some people will probably say that it's better to name all variables which participate in same multiple assignment because it allows future code to see which information is available. \$\endgroup\$yeputons– yeputons2016年05月14日 20:44:23 +00:00Commented May 14, 2016 at 20:44
-
\$\begingroup\$ Makes sense. I thought it might've been a question of efficiency. Thanks. \$\endgroup\$Marc Adler– Marc Adler2016年05月14日 22:06:35 +00:00Commented May 14, 2016 at 22:06
-
\$\begingroup\$ @Mark Adler You may reduce one morr layer of indentation by replacing
for i in notInvoiced: f.write('\t%s\n' % i)
with a list comprehension,join
and a singlef.write
call \$\endgroup\$Caridorc– Caridorc2016年05月15日 08:59:33 +00:00Commented May 15, 2016 at 8:59
Remove else: pass
else: pass
does nothing so you can just remove it. This will make the code structure more regular and plain simpler. While not reducing the indentation this will simplify reading and modifyng.
Beside the excellent suggestions already given, I suggest replacing
re.match('NOT INVOICED', fname)
with
fname.startswith('NOT INVOICED')
A regular expression is overkill for this.
Also: re.match
only matches at the start. If you actually want to match anywhere, use
re.search('NOT INVOICED', fname)
or better:
'NOT INVOICED' in fname