Accessing files in a child directory in Python 2.7.x

Question 1

I have written the below script for use in Python 2.7.x. In essence, I want the script to access a folder contained within the directory the script is and then add all files contained within to a list. I'm then looking to open these files using the csv module and process each line for now.

My script works fine as below but it just seems to be a long-winded way to access the content of files stored in a child directory.

Any suggestions to improve it?

import os
import csv
child_files = []
mypath = "./child_directory/"
onlyfiles = [f for f in os.listdir(mypath) if os.path.isfile(os.path.join(mypath, f))]
for f in onlyfiles:
 file_path = os.path.relpath(f)
 x = os.path.join(mypath, file_path)
 child_files.append(x)
 print x
for f in child_files:
 with open(f, 'rb') as x:
 reader = csv.reader(x)
 for row in reader:
 print row

I'd also be keen to only add '.csv' files contained within the child directory. I can work on this upon a review of my script unless someone can suggest a pythonistic method of incorporating this.

Question 2

I suggest a generator:

def children_files(dir):
 onlyfiles = (f for f in os.listdir(dir) if os.path.isfile(os.path.join(dir, f)))
 for f in onlyfiles:
 file_path = os.path.relpath(f)
 yield os.path.join(dir, file_path)

It is simpler to write (no append) and faster as files will be read just as needed.

Question 3

I would mostly do two things:

Use just a generator comprehension, or make a full fledged generator.

Both are achievable, some people believe comprehensions are the only way to use Python. But I think that using an iterative approach here is quite nice.
Use functools.partial to increase the readability of the code.

Reading os.path.function a lot can be tedious, and has a negative impact on performance.

def child_files(directory):
 prepend_dir = functools.partial(os.path.join, directory)
 for file_name in os.listdir(directory):
 if os.path.isfile(prepend_dir(f)):
 yield prepend_dir(os.path.relpath(file_name))
# Or
prepend_dir = functools.partial(os.path.join, directory)
child_files = (
 prepend_dir(os.path.relpath(file_name))
 for file_name in os.listdir(directory)
 if os.path.isfile(prepend_dir(f))
)

The second half of your code can be improved by using file_name instead of f. This is as f is usually used as a file object, and causes the inner file object to be named x. Which can be confusing.

Question 4

Another interesting alternative can be using os.walk. os.walk is a generator, yielding root, dirs, files values:

root : the base directory it is visiting
dirs : the directories in the directory currently visiting
files : the files in the directory currently visiting

The benefit for you is that files already contains only the files, you don't need to separate them manually as you did in your code.

The only catch is that os.walk normally continues to descend into all subdirectories. You can make it stop after the first directory by clearing the content of dirs.

Like this:

import os
import csv
mypath = "./child_directory/"
def child_files(basedir):
 for root, dirs, files in os.walk(basedir):
 for name in files:
 if name.endswith('.csv'):
 path = os.path.join(root, name)
 yield path
 dirs[:] = [] # don't go to sub-directories
for f in child_files(mypath):
 with open(f, 'rb') as x:
 reader = csv.reader(x)
 for row in reader:
 print(row)

Here, child_files is a generator, yielding files. I added the filtering of .csv files.

Lastly, I suggest using the print() function instead of the print statement. That way your script will be closer to being Python 3 compatible, and has no downsides for you even if you stick with Python 2.

Caridorc Caridorc 28.1k7 gold badges54 silver badges137 bronze badges · Answer 1 · 2016-01-07 22:05:00Z

I suggest a generator:

def children_files(dir):
 onlyfiles = (f for f in os.listdir(dir) if os.path.isfile(os.path.join(dir, f)))
 for f in onlyfiles:
 file_path = os.path.relpath(f)
 yield os.path.join(dir, file_path)

It is simpler to write (no append) and faster as files will be read just as needed.

Peilonrayz ♦Peilonrayz 44.4k7 gold badges80 silver badges157 bronze badges · Answer 2 · 2016-01-07 22:23:07Z

I would mostly do two things:

Use just a generator comprehension, or make a full fledged generator.

Both are achievable, some people believe comprehensions are the only way to use Python. But I think that using an iterative approach here is quite nice.
Use functools.partial to increase the readability of the code.

Reading os.path.function a lot can be tedious, and has a negative impact on performance.

def child_files(directory):
 prepend_dir = functools.partial(os.path.join, directory)
 for file_name in os.listdir(directory):
 if os.path.isfile(prepend_dir(f)):
 yield prepend_dir(os.path.relpath(file_name))
# Or
prepend_dir = functools.partial(os.path.join, directory)
child_files = (
 prepend_dir(os.path.relpath(file_name))
 for file_name in os.listdir(directory)
 if os.path.isfile(prepend_dir(f))
)

The second half of your code can be improved by using file_name instead of f. This is as f is usually used as a file object, and causes the inner file object to be named x. Which can be confusing.

janos janos 113k15 gold badges154 silver badges396 bronze badges · Answer 3 · 2016-01-08 10:29:59Z

Another interesting alternative can be using os.walk. os.walk is a generator, yielding root, dirs, files values:

root : the base directory it is visiting
dirs : the directories in the directory currently visiting
files : the files in the directory currently visiting

The benefit for you is that files already contains only the files, you don't need to separate them manually as you did in your code.

The only catch is that os.walk normally continues to descend into all subdirectories. You can make it stop after the first directory by clearing the content of dirs.

Like this:

import os
import csv
mypath = "./child_directory/"
def child_files(basedir):
 for root, dirs, files in os.walk(basedir):
 for name in files:
 if name.endswith('.csv'):
 path = os.path.join(root, name)
 yield path
 dirs[:] = [] # don't go to sub-directories
for f in child_files(mypath):
 with open(f, 'rb') as x:
 reader = csv.reader(x)
 for row in reader:
 print(row)

Here, child_files is a generator, yielding files. I added the filtering of .csv files.

Lastly, I suggest using the print() function instead of the print statement. That way your script will be closer to being Python 3 compatible, and has no downsides for you even if you stick with Python 2.

Stack Exchange Network

Accessing files in a child directory in Python 2.7.x

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Accessing files in a child directory in Python 2.7.x

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions