I have written the below script for use in Python 2.7.x. In essence, I want the script to access a folder contained within the directory the script is and then add all files contained within to a list. I'm then looking to open these files using the csv module and process each line for now.
My script works fine as below but it just seems to be a long-winded way to access the content of files stored in a child directory.
Any suggestions to improve it?
import os
import csv
child_files = []
mypath = "./child_directory/"
onlyfiles = [f for f in os.listdir(mypath) if os.path.isfile(os.path.join(mypath, f))]
for f in onlyfiles:
file_path = os.path.relpath(f)
x = os.path.join(mypath, file_path)
child_files.append(x)
print x
for f in child_files:
with open(f, 'rb') as x:
reader = csv.reader(x)
for row in reader:
print row
I'd also be keen to only add '.csv' files contained within the child directory. I can work on this upon a review of my script unless someone can suggest a pythonistic method of incorporating this.
3 Answers 3
I suggest a generator:
def children_files(dir):
onlyfiles = (f for f in os.listdir(dir) if os.path.isfile(os.path.join(dir, f)))
for f in onlyfiles:
file_path = os.path.relpath(f)
yield os.path.join(dir, file_path)
It is simpler to write (no append
) and faster as files will be read just as needed.
I would mostly do two things:
Use just a generator comprehension, or make a full fledged generator.
Both are achievable, some people believe comprehensions are the only way to use Python. But I think that using an iterative approach here is quite nice.
Use
functools.partial
to increase the readability of the code.Reading
os.path.function
a lot can be tedious, and has a negative impact on performance.
def child_files(directory):
prepend_dir = functools.partial(os.path.join, directory)
for file_name in os.listdir(directory):
if os.path.isfile(prepend_dir(f)):
yield prepend_dir(os.path.relpath(file_name))
# Or
prepend_dir = functools.partial(os.path.join, directory)
child_files = (
prepend_dir(os.path.relpath(file_name))
for file_name in os.listdir(directory)
if os.path.isfile(prepend_dir(f))
)
The second half of your code can be improved by using file_name
instead of f
.
This is as f
is usually used as a file object, and causes the inner file object to be named x
.
Which can be confusing.
Another interesting alternative can be using os.walk
.
os.walk
is a generator, yielding root
, dirs
, files
values:
root
: the base directory it is visitingdirs
: the directories in the directory currently visitingfiles
: the files in the directory currently visiting
The benefit for you is that files
already contains only the files,
you don't need to separate them manually as you did in your code.
The only catch is that os.walk
normally continues to descend into all subdirectories. You can make it stop after the first directory by clearing the content of dirs
.
Like this:
import os
import csv
mypath = "./child_directory/"
def child_files(basedir):
for root, dirs, files in os.walk(basedir):
for name in files:
if name.endswith('.csv'):
path = os.path.join(root, name)
yield path
dirs[:] = [] # don't go to sub-directories
for f in child_files(mypath):
with open(f, 'rb') as x:
reader = csv.reader(x)
for row in reader:
print(row)
Here, child_files
is a generator, yielding files. I added the filtering of .csv
files.
Lastly, I suggest using the print()
function instead of the print
statement. That way your script will be closer to being Python 3 compatible, and has no downsides for you even if you stick with Python 2.