3
\$\begingroup\$

I've written a script that recursively goes through a directory tree and looks for folders containing folders with NOT INVOICED in the folder name. If it finds one it prints the name of the containing folder and the names of any folders inside it with NOT INVOICED in the folder name.

The output is:

FOLDER 1
 NOT INVOICED FOLDER 1
 NOT INVOICED FOLDER 2
FOLDER 3
 NOT INVOICED FOLDER 1

The idea is that FOLDER 1 contains a few other folders that don't have NOT INVOICED in their name, FOLDER 2 doesn't contain any folders with NOT INVOICED in their name, and so on.

It works, but the only problem is this is what it looks like:

import os
import re
rootDir = '.'
with open('names.txt', 'w') as f:
 for dirName, subdirList,fileList in os.walk(rootDir):
 for fname in subdirList:
 if re.match("NOT INVOICED", fname):
 f.write(dirName + '\n')
 for i in subdirList:
 if re.match("NOT INVOICED", i):
 f.write('\t%s\n' % i)
 else:
 pass
 else:
 pass

There's got to be a way to write this without six indentations.

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked May 14, 2016 at 17:37
\$\endgroup\$

3 Answers 3

4
\$\begingroup\$

It looks like your script's output will look more like this:

FOLDER 1
 NOT INVOICED FOLDER 1
 NOT INVOICED FOLDER 2
FOLDER 1
 NOT INVOICED FOLDER 1
 NOT INVOICED FOLDER 2
FOLDER 3
 NOT INVOICED FOLDER 1

It happens because for each subdirectory which contains NOT INVOICED you'll print containing directory name and all subdirectories. It can be easily fixed by adding an extra break in the end of the first if.

Now to the question. First of all, PEP 8 recommends using 4 spaces per one ident, neither tabs, nor 8 spaces. Also, you are free to omit the else branch in your ifs - no need in pass and extra lines.

I would separate checking of whether the directory should be printed (with some subdirectories) and the actual printing - that would kill inner for loop. Moreover, it will also make it impossible to output a directory twice:

import os
import re
rootDir = '.'
with open('names.txt', 'w') as f:
 for dirName, subdirList,fileList in os.walk(rootDir):
 shouldPrint = False
 for fname in subdirList:
 if re.match("NOT INVOICED", fname):
 shouldPrint = True
 if shouldPrint:
 f.write(dirName + '\n')
 for i in subdirList:
 if re.match("NOT INVOICED", i):
 f.write('\t%s\n' % i)

Next things that I'd do are:

  1. Calculate not invoiced subdirectories just once with list-comprehension. That will remove redundant for+if pair and also lower the risk of misspelling one of regexes.
  2. Replace fileList variable with wildcard as it's unused
  3. Run pep8 tool (can be installed with pip install pep8) on the code to make sure it corresponds with PEP 8. It'll report some whitespace issues in the very first for loop.
  4. Make sure that only only style of quotes is used across the code - ' or ".

And now we have:

import os
import re
rootDir = '.'
with open('names.txt', 'w') as f:
 for dirName, subdirList, _ in os.walk(rootDir):
 notInvoiced = [fname for fname
 in subdirList
 if re.match('NOT INVOICED', fname)]
 if notInvoiced:
 f.write(dirName + '\n')
 for i in notInvoiced:
 f.write('\t%s\n' % i)

Also, if you're Python 3 user, I'd recommend using str.format() instead of % operator - the latter is kind of deprecated.

answered May 14, 2016 at 18:34
\$\endgroup\$
4
  • \$\begingroup\$ Thanks. I didn't even realize that it was producing the wrong output when I ran it through the folders the first time. I was unfamiliar with the for in if structure. What is gained by replacing the variable with a wildcard? \$\endgroup\$ Commented May 14, 2016 at 20:40
  • \$\begingroup\$ @MarcAdler if variable has a name, I personally expect it to be used somewhere. If I don't see any usages, I get a little worried - is it really unused or it's just inattentive me? Matter of taste, I think. Some people will probably say that it's better to name all variables which participate in same multiple assignment because it allows future code to see which information is available. \$\endgroup\$ Commented May 14, 2016 at 20:44
  • \$\begingroup\$ Makes sense. I thought it might've been a question of efficiency. Thanks. \$\endgroup\$ Commented May 14, 2016 at 22:06
  • \$\begingroup\$ @Mark Adler You may reduce one morr layer of indentation by replacing for i in notInvoiced: f.write('\t%s\n' % i) with a list comprehension, join and a single f.write call \$\endgroup\$ Commented May 15, 2016 at 8:59
2
\$\begingroup\$

Remove else: pass

else: pass does nothing so you can just remove it. This will make the code structure more regular and plain simpler. While not reducing the indentation this will simplify reading and modifyng.

answered May 14, 2016 at 18:19
\$\endgroup\$
2
\$\begingroup\$

Beside the excellent suggestions already given, I suggest replacing

re.match('NOT INVOICED', fname)

with

fname.startswith('NOT INVOICED')

A regular expression is overkill for this.

Also: re.match only matches at the start. If you actually want to match anywhere, use

re.search('NOT INVOICED', fname)

or better:

'NOT INVOICED' in fname
answered May 15, 2016 at 6:56
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.