Looking through a directory tree for certain folders

Question 1

I've written a script that recursively goes through a directory tree and looks for folders containing folders with NOT INVOICED in the folder name. If it finds one it prints the name of the containing folder and the names of any folders inside it with NOT INVOICED in the folder name.

The output is:

FOLDER 1
 NOT INVOICED FOLDER 1
 NOT INVOICED FOLDER 2
FOLDER 3
 NOT INVOICED FOLDER 1

The idea is that FOLDER 1 contains a few other folders that don't have NOT INVOICED in their name, FOLDER 2 doesn't contain any folders with NOT INVOICED in their name, and so on.

It works, but the only problem is this is what it looks like:

import os
import re
rootDir = '.'
with open('names.txt', 'w') as f:
 for dirName, subdirList,fileList in os.walk(rootDir):
 for fname in subdirList:
 if re.match("NOT INVOICED", fname):
 f.write(dirName + '\n')
 for i in subdirList:
 if re.match("NOT INVOICED", i):
 f.write('\t%s\n' % i)
 else:
 pass
 else:
 pass

There's got to be a way to write this without six indentations.

Question 2

It looks like your script's output will look more like this:

FOLDER 1
 NOT INVOICED FOLDER 1
 NOT INVOICED FOLDER 2
FOLDER 1
 NOT INVOICED FOLDER 1
 NOT INVOICED FOLDER 2
FOLDER 3
 NOT INVOICED FOLDER 1

It happens because for each subdirectory which contains NOT INVOICED you'll print containing directory name and all subdirectories. It can be easily fixed by adding an extra break in the end of the first if.

Now to the question. First of all, PEP 8 recommends using 4 spaces per one ident, neither tabs, nor 8 spaces. Also, you are free to omit the else branch in your ifs - no need in pass and extra lines.

I would separate checking of whether the directory should be printed (with some subdirectories) and the actual printing - that would kill inner for loop. Moreover, it will also make it impossible to output a directory twice:

import os
import re
rootDir = '.'
with open('names.txt', 'w') as f:
 for dirName, subdirList,fileList in os.walk(rootDir):
 shouldPrint = False
 for fname in subdirList:
 if re.match("NOT INVOICED", fname):
 shouldPrint = True
 if shouldPrint:
 f.write(dirName + '\n')
 for i in subdirList:
 if re.match("NOT INVOICED", i):
 f.write('\t%s\n' % i)

Next things that I'd do are:

Calculate not invoiced subdirectories just once with list-comprehension. That will remove redundant for+if pair and also lower the risk of misspelling one of regexes.
Replace fileList variable with wildcard as it's unused
Run pep8 tool (can be installed with pip install pep8) on the code to make sure it corresponds with PEP 8. It'll report some whitespace issues in the very first for loop.
Make sure that only only style of quotes is used across the code - ' or ".

And now we have:

import os
import re
rootDir = '.'
with open('names.txt', 'w') as f:
 for dirName, subdirList, _ in os.walk(rootDir):
 notInvoiced = [fname for fname
 in subdirList
 if re.match('NOT INVOICED', fname)]
 if notInvoiced:
 f.write(dirName + '\n')
 for i in notInvoiced:
 f.write('\t%s\n' % i)

Also, if you're Python 3 user, I'd recommend using str.format() instead of % operator - the latter is kind of deprecated.

Question 3

Thanks. I didn't even realize that it was producing the wrong output when I ran it through the folders the first time. I was unfamiliar with the for in if structure. What is gained by replacing the variable with a wildcard?

Question 4

@MarcAdler if variable has a name, I personally expect it to be used somewhere. If I don't see any usages, I get a little worried - is it really unused or it's just inattentive me? Matter of taste, I think. Some people will probably say that it's better to name all variables which participate in same multiple assignment because it allows future code to see which information is available.

Question 5

Makes sense. I thought it might've been a question of efficiency. Thanks.

Question 6

@Mark Adler You may reduce one morr layer of indentation by replacing for i in notInvoiced: f.write('\t%s\n' % i) with a list comprehension, join and a single f.write call

Question 7

Remove `else: pass`

else: pass does nothing so you can just remove it. This will make the code structure more regular and plain simpler. While not reducing the indentation this will simplify reading and modifyng.

Question 8

Beside the excellent suggestions already given, I suggest replacing

re.match('NOT INVOICED', fname)

with

fname.startswith('NOT INVOICED')

A regular expression is overkill for this.

Also: re.match only matches at the start. If you actually want to match anywhere, use

re.search('NOT INVOICED', fname)

or better:

'NOT INVOICED' in fname

yeputons yeputons 2911 silver badge3 bronze badges · Accepted Answer · 2016-05-14 18:34:21Z

It looks like your script's output will look more like this:

FOLDER 1
 NOT INVOICED FOLDER 1
 NOT INVOICED FOLDER 2
FOLDER 1
 NOT INVOICED FOLDER 1
 NOT INVOICED FOLDER 2
FOLDER 3
 NOT INVOICED FOLDER 1

It happens because for each subdirectory which contains NOT INVOICED you'll print containing directory name and all subdirectories. It can be easily fixed by adding an extra break in the end of the first if.

Now to the question. First of all, PEP 8 recommends using 4 spaces per one ident, neither tabs, nor 8 spaces. Also, you are free to omit the else branch in your ifs - no need in pass and extra lines.

I would separate checking of whether the directory should be printed (with some subdirectories) and the actual printing - that would kill inner for loop. Moreover, it will also make it impossible to output a directory twice:

import os
import re
rootDir = '.'
with open('names.txt', 'w') as f:
 for dirName, subdirList,fileList in os.walk(rootDir):
 shouldPrint = False
 for fname in subdirList:
 if re.match("NOT INVOICED", fname):
 shouldPrint = True
 if shouldPrint:
 f.write(dirName + '\n')
 for i in subdirList:
 if re.match("NOT INVOICED", i):
 f.write('\t%s\n' % i)

Next things that I'd do are:

Calculate not invoiced subdirectories just once with list-comprehension. That will remove redundant for+if pair and also lower the risk of misspelling one of regexes.
Replace fileList variable with wildcard as it's unused
Run pep8 tool (can be installed with pip install pep8) on the code to make sure it corresponds with PEP 8. It'll report some whitespace issues in the very first for loop.
Make sure that only only style of quotes is used across the code - ' or ".

And now we have:

import os
import re
rootDir = '.'
with open('names.txt', 'w') as f:
 for dirName, subdirList, _ in os.walk(rootDir):
 notInvoiced = [fname for fname
 in subdirList
 if re.match('NOT INVOICED', fname)]
 if notInvoiced:
 f.write(dirName + '\n')
 for i in notInvoiced:
 f.write('\t%s\n' % i)

Also, if you're Python 3 user, I'd recommend using str.format() instead of % operator - the latter is kind of deprecated.

Thanks. I didn't even realize that it was producing the wrong output when I ran it through the folders the first time. I was unfamiliar with the for in if structure. What is gained by replacing the variable with a wildcard?
@MarcAdler if variable has a name, I personally expect it to be used somewhere. If I don't see any usages, I get a little worried - is it really unused or it's just inattentive me? Matter of taste, I think. Some people will probably say that it's better to name all variables which participate in same multiple assignment because it allows future code to see which information is available.
Makes sense. I thought it might've been a question of efficiency. Thanks.
@Mark Adler You may reduce one morr layer of indentation by replacing for i in notInvoiced: f.write('\t%s\n' % i) with a list comprehension, join and a single f.write call

Stack Exchange Network

Looking through a directory tree for certain folders

3 Answers 3

Remove `else: pass`

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Looking through a directory tree for certain folders

3 Answers 3

Remove else: pass

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Remove `else: pass`