Given a base path and a list with extensions the task is to list all files:
Two of my solutions are:
from glob import glob
from os import path
EXTENSIONS = ['*.zip', '*.jar', '*.pdf']
DOC_PATH = '/path/to/files'
# Solution1:
files = []
for ext in EXTENSIONS:
files.extend(glob(path.join(DOC_PATH, ext)))
# works but looks very clumsy
# Solution2:
files = reduce(lambda x,y: x+y,
[glob(path.join(DOC_PATH, ext)) for ext in EXTENSIONS])
# Also functional but looks like a misuse of reduce
Have you got any other ideas?
2 Answers 2
If you only need to iterate over them (once) and not an actual list, you could use itertools.chain
and glob.iglob
:
files = chain(*(iglob(path.join(DOC_PATH, ext)) for ext in EXTENSIONS))
If you do need an actual list, you can further call list(files)
, of course.
-
\$\begingroup\$
chain
is quite a good idea \$\endgroup\$ProfHase85– ProfHase852014年07月07日 12:35:06 +00:00Commented Jul 7, 2014 at 12:35
Using os.listdir
and os.path.splitext
:
import os
EXTENSIONS = 'zip jar pdf'.split()
EXTENSION_SET = set('.' + e for e in EXTENSIONS)
files = [f for f in os.listdir(DOC_PATH) if os.path.splitext(f)[1] in EXTENSION_SET]
Using os.listdir
and re.search
:
import os
import re
EXTENSIONS = 'zip jar pdf'.split()
EXTENSION_RE = re.compile(r'\.({})$'.format('|'.join(EXTENSIONS))
files = [f for f in os.listdir(DOC_PATH) if EXTENSION_RE.search(f)]
# or files = list(filter(EXTENSION_RE.search, os.listdir(DOC_PATH)))
The advantage of these approaches is that you only iterate over the files in the directory once (rather than once for each extension, as in the glob
case).
sum((glob(path.join(DOC_PATH, ext)) for ext in EXTENSIONS), [])
. \$\endgroup\$