I haven't used regex a lot and I needed to set up a script that can gather a list of file paths that should adhere to a strict formatting convention, so I thought that sounded like a good opportunity to use them.
To explain a bit, there's a set of sequence folders inside a root folder, and in those sequences is a set of scene folders. In those is a set of constant named folders, the relevant one being "AEP Files". Then in there is a set of after effects files, of which I want to get the highest numbered version which is denoted but _v##.aep
at the end of the file. A sample path might look like this:
P:\ProjectName\Scenes\e10_q04\e10_q04_s12\AEP Files\proj_e10_q04_s12_v04.aep
I'm particularly interested to know
- If I'm using regex correctly,
- Whether I should use something other than
if re.match(...)
? - Whether I could make it more efficient (particularly the list comprehensions); and
- Given the complexity, how is the current readability?
I had ideas on more efficiency but they involved collapsing list comprehensions down and I thought that would make an unreadable mess, so I opted not to. Any other feedback you want to give is also welcome!
import os
import re
root = r'P:\ProjectName\Scenes'
IGNORE = re.IGNORECASE
folders = [os.path.join(root, f) for f in os.listdir(root)
if re.match(r'e\d*q\d*', f, IGNORE)]
folders.sort()
scene_folders = [os.path.join(folder, f) for folder in folders
for f in os.listdir(folder)
if re.match(r'e\d*_q\d*_s\d*', f, IGNORE)]
scene_folders.sort()
scenes = []
missing_scenes = []
for folder in scene_folders:
matches = [re.match(r'proj_q\d*_s\d*_v(\d*)\.aep', f, IGNORE)
for f in os.listdir(os.path.join(folder, "AEP Files"))
if re.match(r'proj_q\d*_s\d*_v(\d*)\.aep', f, IGNORE)]
matches = [(match.group(), match.groups()[0]) for match in matches]
if matches:
scenes.append(os.path.join(folder, sorted(matches)[-1][0]))
else:
missing_scenes.append(folder)
2 Answers 2
root
should be capitalized as it is a constant.- You can shorten the code by using
sorted
that returns a new list instead of working in place. IGNORE
in my opinion reduces readibility as it is less obvious than IGNORECASE.- Your regexes are pretty hard, I would extract them into constants and give them a name.
- You make good use of list comprehensions, they shorten and simplify code, good job!
- Performance may not be a primary concern here, but it is a good habit to use lazy generators: putting round parenthesis instead of square brackets outside the
match
comprehension will not build the list, saving memory and time. The same goes forscene_floders
.
Well, here's an example, since it'd be hard for me to replicate your file system, I also included the directory listing with examples:
import glob
import os
def dirs_missing_files(mask, extension):
directories = glob.glob(mask)
nonempty = map(os.path.dirname, glob.glob(mask + '/*.' + extension))
return set(directories) - set(nonempty)
Directory listing:
/home/wvxvw/Documents/uni:
total used in directory 56 available 39216308
drwxrwxr-x. 2 wvxvw wvxvw 4096 Aug 18 22:30 automata-theory
-rw-------. 1 wvxvw wvxvw 60 Mar 27 19:56 .directory
drwxrwxr-x. 3 wvxvw wvxvw 12288 Jul 10 18:46 discrete-mathematics
drwxrwxr-x. 4 wvxvw wvxvw 4096 Jun 27 19:27 infinitesimal-calculus
drwxrwxr-x. 6 wvxvw wvxvw 4096 Aug 17 17:13 intro-to-java
drwxrwxr-x. 4 wvxvw wvxvw 4096 Feb 14 2015 intro-to-math
drwxrwxr-x. 5 wvxvw wvxvw 4096 Jul 15 14:26 intro-to-statistics
drwxrwxr-x. 4 wvxvw wvxvw 4096 May 9 13:46 linear-algebra-1
/home/wvxvw/Documents/uni/automata-theory:
total used in directory 280 available 39216308
-rw-r-----. 1 wvxvw wvxvw 272279 Aug 18 22:30 syllabus.pdf
/home/wvxvw/Documents/uni/discrete-mathematics:
total used in directory 4524 available 39216308
-rw-rw-r--. 1 wvxvw wvxvw 12219 Mar 26 00:34 assignment-11.org
-rw-rw-r--. 1 wvxvw wvxvw 256614 May 16 13:05 assignment-11.pdf
-rw-rw-r--. 1 wvxvw wvxvw 13588 Apr 19 00:52 assignment-12.org
-rw-rw-r--. 1 wvxvw wvxvw 252578 Apr 19 00:52 assignment-12.pdf
-rw-rw-r--. 1 wvxvw wvxvw 10193 May 5 21:14 assignment-13.org
-rw-rw-r--. 1 wvxvw wvxvw 231959 May 5 21:14 assignment-13.pdf
-rw-rw-r--. 1 wvxvw wvxvw 9577 Jul 10 18:46 assignment-14.org
-rw-rw-r--. 1 wvxvw wvxvw 241745 Jul 10 18:46 assignment-14.pdf
-rw-rw-r--. 1 wvxvw wvxvw 11861 Jun 5 12:54 assignment-15.org
-rw-rw-r--. 1 wvxvw wvxvw 222046 Jun 5 12:54 assignment-15.pdf
-rw-rw-r--. 1 wvxvw wvxvw 11012 Jun 13 19:37 assignment-16.org
-rw-rw-r--. 1 wvxvw wvxvw 191311 Jun 13 19:37 assignment-16.pdf
-rw-rw-r--. 1 wvxvw wvxvw 6124 Jun 15 13:33 eulerian-graph-no-perfect-matching.png
drwxrwxr-x. 8 wvxvw wvxvw 4096 Jul 10 18:46 .git
-rw-rw-r--. 1 wvxvw wvxvw 650 Jun 15 14:07 .gitignore
-rw-rw-r--. 1 wvxvw wvxvw 9359 Jun 3 18:54 helpers.pl
-rw-r-----. 1 wvxvw wvxvw 476042 Mar 21 16:20 problems.pdf
/home/wvxvw/Documents/uni/infinitesimal-calculus:
total used in directory 3816 available 39216308
-rw-rw-r--. 1 wvxvw wvxvw 25952 Mar 21 13:03 assignment-11.org
-rw-rw-r--. 1 wvxvw wvxvw 235003 Mar 21 13:03 assignment-11.pdf
-rw-rw-r--. 1 wvxvw wvxvw 17516 Apr 16 09:17 assignment-12.org
-rw-rw-r--. 1 wvxvw wvxvw 235187 Apr 16 09:17 assignment-12.pdf
-rw-rw-r--. 1 wvxvw wvxvw 16124 May 4 22:46 assignment-13.org
-rw-rw-r--. 1 wvxvw wvxvw 236487 May 4 22:46 assignment-13.pdf
-rw-rw-r--. 1 wvxvw wvxvw 22196 May 21 02:35 assignment-14.org
-rw-rw-r--. 1 wvxvw wvxvw 331419 May 21 02:35 assignment-14.pdf
-rw-rw-r--. 1 wvxvw wvxvw 16690 Jun 20 23:26 assignment-17.org
-rw-rw-r--. 1 wvxvw wvxvw 248733 Jun 20 23:26 assignment-17.pdf
-rw-rw-r--. 1 wvxvw wvxvw 6575 Jun 27 19:27 assignment-18.org
-rw-rw-r--. 1 wvxvw wvxvw 174323 Jun 27 19:27 assignment-18.pdf
drwxrwxr-x. 8 wvxvw wvxvw 4096 Aug 21 16:35 .git
-rw-rw-r--. 1 wvxvw wvxvw 249 May 21 02:38 .gitignore
-rw-rw-r--. 1 wvxvw wvxvw 744 Mar 14 16:59 helpers.lisp
drwxrwxr-x. 2 wvxvw wvxvw 4096 May 21 02:35 images
-rw-r-----. 1 wvxvw wvxvw 92350 Apr 3 15:48 problems-12.pdf
-rw-r-----. 1 wvxvw wvxvw 40325 Apr 23 19:00 problems-13.pdf
-rw-r-----. 1 wvxvw wvxvw 37484 May 9 11:32 problems-14.pdf
-rw-r-----. 1 wvxvw wvxvw 37945 Jun 19 12:30 problems-17.pdf
-rw-r-----. 1 wvxvw wvxvw 39231 Jun 19 12:31 problems-18.pdf
-rw-r-----. 1 wvxvw wvxvw 35856 Jun 19 12:31 problems-19.pdf
/home/wvxvw/Documents/uni/intro-to-java:
total used in directory 292 available 39216308
drwxrwxr-x. 2 wvxvw wvxvw 4096 Aug 18 17:58 etc
drwxrwxr-x. 8 wvxvw wvxvw 4096 Aug 18 18:03 .git
-rw-rw-r--. 1 wvxvw wvxvw 10 Aug 15 14:32 .gitignore
-rw-r-----. 1 wvxvw wvxvw 249345 Aug 10 20:44 java-silibus.pdf
-rw-rw-r--. 1 wvxvw wvxvw 2393 Aug 16 12:54 README.org
drwxrwxr-x. 4 wvxvw wvxvw 4096 Aug 15 13:34 src
drwxrwxr-x. 12 wvxvw wvxvw 4096 Aug 18 18:02 target
/home/wvxvw/Documents/uni/intro-to-statistics:
total used in directory 4524 available 39216308
-rw-rw-r--. 1 wvxvw wvxvw 314127 Apr 1 21:11 assignment-11.doc
-rw-rw-r--. 1 wvxvw wvxvw 21077 Apr 1 21:09 assignment-11.org
-rw-rw-r--. 1 wvxvw wvxvw 294074 Apr 1 21:10 assignment-11.pdf
-rw-rw-r--. 1 wvxvw wvxvw 22752 Apr 15 23:01 assignment-12.org
-rw-rw-r--. 1 wvxvw wvxvw 277692 Apr 15 23:01 assignment-12.pdf
-rw-rw-r--. 1 wvxvw wvxvw 14053 May 3 13:14 assignment-13.org
-rw-rw-r--. 1 wvxvw wvxvw 270695 May 3 13:14 assignment-13.pdf
-rw-rw-r--. 1 wvxvw wvxvw 11482 May 28 15:58 assignment-14.org
-rw-rw-r--. 1 wvxvw wvxvw 208160 May 28 15:58 assignment-14.pdf
-rw-rw-r--. 1 wvxvw wvxvw 17096 Jun 18 14:24 assignment-15.org
-rw-rw-r--. 1 wvxvw wvxvw 253588 Jun 18 14:25 assignment-15.pdf
-rw-rw-r--. 1 wvxvw wvxvw 3382 Apr 1 22:07 bogus1.html
-rw-rw-r--. 1 wvxvw wvxvw 3184 Apr 1 21:53 bogus.html
-rw-------. 1 wvxvw wvxvw 60 Mar 27 19:56 .directory
-rw-r-----. 1 wvxvw wvxvw 91041 Jul 15 14:25 exam-1.pdf
-rw-r-----. 1 wvxvw wvxvw 1448545 Jul 15 14:26 exam1-solution.pdf
drwxrwxr-x. 8 wvxvw wvxvw 4096 Jun 15 14:08 .git
-rw-rw-r--. 1 wvxvw wvxvw 143 Apr 10 22:45 .gitignore
drwxrwxr-x. 2 wvxvw wvxvw 4096 Mar 27 19:57 helpers
-rw-rw-r--. 1 wvxvw wvxvw 6919 Apr 14 23:34 helpers.lisp
-rw-rw-r--. 1 wvxvw wvxvw 3859 May 25 23:28 helpers.py
drwxrwxr-x. 2 wvxvw wvxvw 4096 Apr 1 20:12 images
-rw-r-----. 1 wvxvw wvxvw 111674 Mar 27 10:39 problems-11.pdf
-rw-r-----. 1 wvxvw wvxvw 77819 Apr 10 20:20 problems-12.pdf
-rw-r-----. 1 wvxvw wvxvw 77599 Apr 23 11:51 problems-13.pdf
-rw-rw-r--. 1 wvxvw wvxvw 119288 May 25 16:57 problems-14.pdf
-rw-rw-r--. 1 wvxvw wvxvw 90508 Jun 15 14:06 problems-15.pdf
-rwxrwxr-x. 1 wvxvw wvxvw 785 Apr 1 22:06 submit.sh
/home/wvxvw/Documents/uni/linear-algebra-1:
total used in directory 3152 available 39216236
-rw-rw-r--. 1 wvxvw wvxvw 28703 Nov 8 2014 assignment-11a.html
-rw-rw-r--. 1 wvxvw wvxvw 25346 Nov 8 2014 assignment-11a.org
-rw-rw-r--. 1 wvxvw wvxvw 202979 Nov 8 2014 assignment-11a.pdf
-rw-rw-r--. 1 wvxvw wvxvw 63359 Nov 2 2014 assignment-11.html
-rw-rw-r--. 1 wvxvw wvxvw 22187 May 9 12:04 assignment-11.org
-rw-rw-r--. 1 wvxvw wvxvw 274225 Nov 2 2014 assignment-11.pdf
-rw-rw-r--. 1 wvxvw wvxvw 14588 Nov 21 2014 assignment-12.org
-rw-rw-r--. 1 wvxvw wvxvw 191167 Nov 21 2014 assignment-12.pdf
-rw-rw-r--. 1 wvxvw wvxvw 16681 Dec 27 2014 assignment-13.org
-rw-rw-r--. 1 wvxvw wvxvw 303757 Dec 27 2014 assignment-13.pdf
-rw-rw-r--. 1 wvxvw wvxvw 11274 Jan 11 2015 assignment-14.org
-rw-rw-r--. 1 wvxvw wvxvw 184519 Jan 11 2015 assignment-14.pdf
-rw-rw-r--. 1 wvxvw wvxvw 9703 Jan 24 2015 assignment-15.org
-rw-rw-r--. 1 wvxvw wvxvw 203870 Jan 24 2015 assignment-15.pdf
-rw-rw-r--. 1 wvxvw wvxvw 5525 Feb 4 2015 assignment-16.org
-rw-rw-r--. 1 wvxvw wvxvw 172066 Feb 4 2015 assignment-16.pdf
drwxrwxr-x. 2 wvxvw wvxvw 4096 Nov 2 2014 css
-rw-------. 1 wvxvw wvxvw 59 Nov 25 2014 .directory
-rw-rw-r--. 1 wvxvw wvxvw 416499 Oct 31 2014 exercises.pdf
drwxrwxr-x. 8 wvxvw wvxvw 4096 May 12 17:06 .git
-rw-rw-r--. 1 wvxvw wvxvw 521 Feb 4 2015 .gitignore
Example usage:
# find all directories not containing `html' files
>>> dirs_missing_files('./*', 'html')
set(['./automata-theory', './intro-to-math',
'./intro-to-java', './infinitesimal-calculus',
'./discrete-mathematics'])
glob
instead of manipulating directories. Would make it easier and shorter. I.e. first select all{root}/*/*/*/*.aep
files, then from this you can figure out which folders contain *.aep files, diff it with the list of all folders - and you get the missing scenes. \$\endgroup\$glob
. I'm going to look it up now, but if you want to write an answer about it in the mean time (even if it's short) please do. \$\endgroup\$glob
will only match exact characters. \$\endgroup\$