5
\$\begingroup\$

I haven't used regex a lot and I needed to set up a script that can gather a list of file paths that should adhere to a strict formatting convention, so I thought that sounded like a good opportunity to use them.

To explain a bit, there's a set of sequence folders inside a root folder, and in those sequences is a set of scene folders. In those is a set of constant named folders, the relevant one being "AEP Files". Then in there is a set of after effects files, of which I want to get the highest numbered version which is denoted but _v##.aep at the end of the file. A sample path might look like this:

P:\ProjectName\Scenes\e10_q04\e10_q04_s12\AEP Files\proj_e10_q04_s12_v04.aep

I'm particularly interested to know

  1. If I'm using regex correctly,
  2. Whether I should use something other than if re.match(...)?
  3. Whether I could make it more efficient (particularly the list comprehensions); and
  4. Given the complexity, how is the current readability?

I had ideas on more efficiency but they involved collapsing list comprehensions down and I thought that would make an unreadable mess, so I opted not to. Any other feedback you want to give is also welcome!

import os
import re
root = r'P:\ProjectName\Scenes'
IGNORE = re.IGNORECASE
folders = [os.path.join(root, f) for f in os.listdir(root)
 if re.match(r'e\d*q\d*', f, IGNORE)]
folders.sort()
scene_folders = [os.path.join(folder, f) for folder in folders
 for f in os.listdir(folder)
 if re.match(r'e\d*_q\d*_s\d*', f, IGNORE)]
scene_folders.sort()
scenes = []
missing_scenes = []
for folder in scene_folders:
 matches = [re.match(r'proj_q\d*_s\d*_v(\d*)\.aep', f, IGNORE)
 for f in os.listdir(os.path.join(folder, "AEP Files"))
 if re.match(r'proj_q\d*_s\d*_v(\d*)\.aep', f, IGNORE)]
 matches = [(match.group(), match.groups()[0]) for match in matches]
 if matches:
 scenes.append(os.path.join(folder, sorted(matches)[-1][0]))
 else:
 missing_scenes.append(folder)
Quill
12k5 gold badges41 silver badges93 bronze badges
asked Sep 1, 2015 at 9:41
\$\endgroup\$
3
  • 1
    \$\begingroup\$ You could use glob instead of manipulating directories. Would make it easier and shorter. I.e. first select all {root}/*/*/*/*.aep files, then from this you can figure out which folders contain *.aep files, diff it with the list of all folders - and you get the missing scenes. \$\endgroup\$ Commented Sep 1, 2015 at 12:48
  • \$\begingroup\$ @wvxvw Ah, I've never used glob. I'm going to look it up now, but if you want to write an answer about it in the mean time (even if it's short) please do. \$\endgroup\$ Commented Sep 1, 2015 at 13:17
  • \$\begingroup\$ @wvxvw In particular, I'm curious to know if there's any way to apply regex-like patterns to it. I'd like to catch files that accidentally have a different number of digits in their path, and my limited understanding tells me that glob will only match exact characters. \$\endgroup\$ Commented Sep 1, 2015 at 13:36

2 Answers 2

4
\$\begingroup\$
  • root should be capitalized as it is a constant.
  • You can shorten the code by using sorted that returns a new list instead of working in place.
  • IGNORE in my opinion reduces readibility as it is less obvious than IGNORECASE.
  • Your regexes are pretty hard, I would extract them into constants and give them a name.
  • You make good use of list comprehensions, they shorten and simplify code, good job!
  • Performance may not be a primary concern here, but it is a good habit to use lazy generators: putting round parenthesis instead of square brackets outside the match comprehension will not build the list, saving memory and time. The same goes for scene_floders.
answered Sep 1, 2015 at 10:51
\$\endgroup\$
4
\$\begingroup\$

Well, here's an example, since it'd be hard for me to replicate your file system, I also included the directory listing with examples:

import glob
import os
def dirs_missing_files(mask, extension):
 directories = glob.glob(mask)
 nonempty = map(os.path.dirname, glob.glob(mask + '/*.' + extension))
 return set(directories) - set(nonempty)

Directory listing:

/home/wvxvw/Documents/uni:
total used in directory 56 available 39216308
drwxrwxr-x. 2 wvxvw wvxvw 4096 Aug 18 22:30 automata-theory
-rw-------. 1 wvxvw wvxvw 60 Mar 27 19:56 .directory
drwxrwxr-x. 3 wvxvw wvxvw 12288 Jul 10 18:46 discrete-mathematics
drwxrwxr-x. 4 wvxvw wvxvw 4096 Jun 27 19:27 infinitesimal-calculus
drwxrwxr-x. 6 wvxvw wvxvw 4096 Aug 17 17:13 intro-to-java
drwxrwxr-x. 4 wvxvw wvxvw 4096 Feb 14 2015 intro-to-math
drwxrwxr-x. 5 wvxvw wvxvw 4096 Jul 15 14:26 intro-to-statistics
drwxrwxr-x. 4 wvxvw wvxvw 4096 May 9 13:46 linear-algebra-1
/home/wvxvw/Documents/uni/automata-theory:
total used in directory 280 available 39216308
-rw-r-----. 1 wvxvw wvxvw 272279 Aug 18 22:30 syllabus.pdf
/home/wvxvw/Documents/uni/discrete-mathematics:
total used in directory 4524 available 39216308
-rw-rw-r--. 1 wvxvw wvxvw 12219 Mar 26 00:34 assignment-11.org
-rw-rw-r--. 1 wvxvw wvxvw 256614 May 16 13:05 assignment-11.pdf
-rw-rw-r--. 1 wvxvw wvxvw 13588 Apr 19 00:52 assignment-12.org
-rw-rw-r--. 1 wvxvw wvxvw 252578 Apr 19 00:52 assignment-12.pdf
-rw-rw-r--. 1 wvxvw wvxvw 10193 May 5 21:14 assignment-13.org
-rw-rw-r--. 1 wvxvw wvxvw 231959 May 5 21:14 assignment-13.pdf
-rw-rw-r--. 1 wvxvw wvxvw 9577 Jul 10 18:46 assignment-14.org
-rw-rw-r--. 1 wvxvw wvxvw 241745 Jul 10 18:46 assignment-14.pdf
-rw-rw-r--. 1 wvxvw wvxvw 11861 Jun 5 12:54 assignment-15.org
-rw-rw-r--. 1 wvxvw wvxvw 222046 Jun 5 12:54 assignment-15.pdf
-rw-rw-r--. 1 wvxvw wvxvw 11012 Jun 13 19:37 assignment-16.org
-rw-rw-r--. 1 wvxvw wvxvw 191311 Jun 13 19:37 assignment-16.pdf
-rw-rw-r--. 1 wvxvw wvxvw 6124 Jun 15 13:33 eulerian-graph-no-perfect-matching.png
drwxrwxr-x. 8 wvxvw wvxvw 4096 Jul 10 18:46 .git
-rw-rw-r--. 1 wvxvw wvxvw 650 Jun 15 14:07 .gitignore
-rw-rw-r--. 1 wvxvw wvxvw 9359 Jun 3 18:54 helpers.pl
-rw-r-----. 1 wvxvw wvxvw 476042 Mar 21 16:20 problems.pdf
/home/wvxvw/Documents/uni/infinitesimal-calculus:
total used in directory 3816 available 39216308
-rw-rw-r--. 1 wvxvw wvxvw 25952 Mar 21 13:03 assignment-11.org
-rw-rw-r--. 1 wvxvw wvxvw 235003 Mar 21 13:03 assignment-11.pdf
-rw-rw-r--. 1 wvxvw wvxvw 17516 Apr 16 09:17 assignment-12.org
-rw-rw-r--. 1 wvxvw wvxvw 235187 Apr 16 09:17 assignment-12.pdf
-rw-rw-r--. 1 wvxvw wvxvw 16124 May 4 22:46 assignment-13.org
-rw-rw-r--. 1 wvxvw wvxvw 236487 May 4 22:46 assignment-13.pdf
-rw-rw-r--. 1 wvxvw wvxvw 22196 May 21 02:35 assignment-14.org
-rw-rw-r--. 1 wvxvw wvxvw 331419 May 21 02:35 assignment-14.pdf
-rw-rw-r--. 1 wvxvw wvxvw 16690 Jun 20 23:26 assignment-17.org
-rw-rw-r--. 1 wvxvw wvxvw 248733 Jun 20 23:26 assignment-17.pdf
-rw-rw-r--. 1 wvxvw wvxvw 6575 Jun 27 19:27 assignment-18.org
-rw-rw-r--. 1 wvxvw wvxvw 174323 Jun 27 19:27 assignment-18.pdf
drwxrwxr-x. 8 wvxvw wvxvw 4096 Aug 21 16:35 .git
-rw-rw-r--. 1 wvxvw wvxvw 249 May 21 02:38 .gitignore
-rw-rw-r--. 1 wvxvw wvxvw 744 Mar 14 16:59 helpers.lisp
drwxrwxr-x. 2 wvxvw wvxvw 4096 May 21 02:35 images
-rw-r-----. 1 wvxvw wvxvw 92350 Apr 3 15:48 problems-12.pdf
-rw-r-----. 1 wvxvw wvxvw 40325 Apr 23 19:00 problems-13.pdf
-rw-r-----. 1 wvxvw wvxvw 37484 May 9 11:32 problems-14.pdf
-rw-r-----. 1 wvxvw wvxvw 37945 Jun 19 12:30 problems-17.pdf
-rw-r-----. 1 wvxvw wvxvw 39231 Jun 19 12:31 problems-18.pdf
-rw-r-----. 1 wvxvw wvxvw 35856 Jun 19 12:31 problems-19.pdf
/home/wvxvw/Documents/uni/intro-to-java:
total used in directory 292 available 39216308
drwxrwxr-x. 2 wvxvw wvxvw 4096 Aug 18 17:58 etc
drwxrwxr-x. 8 wvxvw wvxvw 4096 Aug 18 18:03 .git
-rw-rw-r--. 1 wvxvw wvxvw 10 Aug 15 14:32 .gitignore
-rw-r-----. 1 wvxvw wvxvw 249345 Aug 10 20:44 java-silibus.pdf
-rw-rw-r--. 1 wvxvw wvxvw 2393 Aug 16 12:54 README.org
drwxrwxr-x. 4 wvxvw wvxvw 4096 Aug 15 13:34 src
drwxrwxr-x. 12 wvxvw wvxvw 4096 Aug 18 18:02 target
/home/wvxvw/Documents/uni/intro-to-statistics:
total used in directory 4524 available 39216308
-rw-rw-r--. 1 wvxvw wvxvw 314127 Apr 1 21:11 assignment-11.doc
-rw-rw-r--. 1 wvxvw wvxvw 21077 Apr 1 21:09 assignment-11.org
-rw-rw-r--. 1 wvxvw wvxvw 294074 Apr 1 21:10 assignment-11.pdf
-rw-rw-r--. 1 wvxvw wvxvw 22752 Apr 15 23:01 assignment-12.org
-rw-rw-r--. 1 wvxvw wvxvw 277692 Apr 15 23:01 assignment-12.pdf
-rw-rw-r--. 1 wvxvw wvxvw 14053 May 3 13:14 assignment-13.org
-rw-rw-r--. 1 wvxvw wvxvw 270695 May 3 13:14 assignment-13.pdf
-rw-rw-r--. 1 wvxvw wvxvw 11482 May 28 15:58 assignment-14.org
-rw-rw-r--. 1 wvxvw wvxvw 208160 May 28 15:58 assignment-14.pdf
-rw-rw-r--. 1 wvxvw wvxvw 17096 Jun 18 14:24 assignment-15.org
-rw-rw-r--. 1 wvxvw wvxvw 253588 Jun 18 14:25 assignment-15.pdf
-rw-rw-r--. 1 wvxvw wvxvw 3382 Apr 1 22:07 bogus1.html
-rw-rw-r--. 1 wvxvw wvxvw 3184 Apr 1 21:53 bogus.html
-rw-------. 1 wvxvw wvxvw 60 Mar 27 19:56 .directory
-rw-r-----. 1 wvxvw wvxvw 91041 Jul 15 14:25 exam-1.pdf
-rw-r-----. 1 wvxvw wvxvw 1448545 Jul 15 14:26 exam1-solution.pdf
drwxrwxr-x. 8 wvxvw wvxvw 4096 Jun 15 14:08 .git
-rw-rw-r--. 1 wvxvw wvxvw 143 Apr 10 22:45 .gitignore
drwxrwxr-x. 2 wvxvw wvxvw 4096 Mar 27 19:57 helpers
-rw-rw-r--. 1 wvxvw wvxvw 6919 Apr 14 23:34 helpers.lisp
-rw-rw-r--. 1 wvxvw wvxvw 3859 May 25 23:28 helpers.py
drwxrwxr-x. 2 wvxvw wvxvw 4096 Apr 1 20:12 images
-rw-r-----. 1 wvxvw wvxvw 111674 Mar 27 10:39 problems-11.pdf
-rw-r-----. 1 wvxvw wvxvw 77819 Apr 10 20:20 problems-12.pdf
-rw-r-----. 1 wvxvw wvxvw 77599 Apr 23 11:51 problems-13.pdf
-rw-rw-r--. 1 wvxvw wvxvw 119288 May 25 16:57 problems-14.pdf
-rw-rw-r--. 1 wvxvw wvxvw 90508 Jun 15 14:06 problems-15.pdf
-rwxrwxr-x. 1 wvxvw wvxvw 785 Apr 1 22:06 submit.sh
/home/wvxvw/Documents/uni/linear-algebra-1:
total used in directory 3152 available 39216236
-rw-rw-r--. 1 wvxvw wvxvw 28703 Nov 8 2014 assignment-11a.html
-rw-rw-r--. 1 wvxvw wvxvw 25346 Nov 8 2014 assignment-11a.org
-rw-rw-r--. 1 wvxvw wvxvw 202979 Nov 8 2014 assignment-11a.pdf
-rw-rw-r--. 1 wvxvw wvxvw 63359 Nov 2 2014 assignment-11.html
-rw-rw-r--. 1 wvxvw wvxvw 22187 May 9 12:04 assignment-11.org
-rw-rw-r--. 1 wvxvw wvxvw 274225 Nov 2 2014 assignment-11.pdf
-rw-rw-r--. 1 wvxvw wvxvw 14588 Nov 21 2014 assignment-12.org
-rw-rw-r--. 1 wvxvw wvxvw 191167 Nov 21 2014 assignment-12.pdf
-rw-rw-r--. 1 wvxvw wvxvw 16681 Dec 27 2014 assignment-13.org
-rw-rw-r--. 1 wvxvw wvxvw 303757 Dec 27 2014 assignment-13.pdf
-rw-rw-r--. 1 wvxvw wvxvw 11274 Jan 11 2015 assignment-14.org
-rw-rw-r--. 1 wvxvw wvxvw 184519 Jan 11 2015 assignment-14.pdf
-rw-rw-r--. 1 wvxvw wvxvw 9703 Jan 24 2015 assignment-15.org
-rw-rw-r--. 1 wvxvw wvxvw 203870 Jan 24 2015 assignment-15.pdf
-rw-rw-r--. 1 wvxvw wvxvw 5525 Feb 4 2015 assignment-16.org
-rw-rw-r--. 1 wvxvw wvxvw 172066 Feb 4 2015 assignment-16.pdf
drwxrwxr-x. 2 wvxvw wvxvw 4096 Nov 2 2014 css
-rw-------. 1 wvxvw wvxvw 59 Nov 25 2014 .directory
-rw-rw-r--. 1 wvxvw wvxvw 416499 Oct 31 2014 exercises.pdf
drwxrwxr-x. 8 wvxvw wvxvw 4096 May 12 17:06 .git
-rw-rw-r--. 1 wvxvw wvxvw 521 Feb 4 2015 .gitignore

Example usage:

# find all directories not containing `html' files
>>> dirs_missing_files('./*', 'html')
set(['./automata-theory', './intro-to-math', 
 './intro-to-java', './infinitesimal-calculus', 
 './discrete-mathematics'])
answered Sep 1, 2015 at 13:48
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.