I am using a recursive algorithm to find all of the file-paths in a given directory: it returns a dictionary like this: {'Tkinter.py': 'C:\Python27\Lib\lib-tk\Tkinter.py', ...}
.
I am using this in a script to open modules by solely given the name. Currently, the whole process (for everything in sys.path
) takes about 9 seconds. To avoid doing this every time, I have it save to a .pkl
file and then just load this in my module-opener program.
The original recursive method took too long and sometimes gave me a MemoryError
, so what I did was create a helper method to iterate through the subfolders (using os.listdir
), and then call the recursive method.
Here is my code:
import os, os.path
def getDirs(path):
sub = os.listdir(path)
paths = {}
for p in sub:
print p
pDir = '{}\{}'.format(path, p)
if os.path.isdir(pDir):
paths.update(getAllDirs(pDir, paths))
else:
paths[p] = pDir
return paths
def getAllDirs(mainPath, paths = {}):
subPaths = os.listdir(mainPath)
for path in subPaths:
pathDir = '{}\{}'.format(mainPath, path)
if os.path.isdir(pathDir):
paths.update(getAllDirs(pathDir, paths))
else:
paths[path] = pathDir
return paths
Is there any way to make this faster? Thanks!
1 Answer 1
import os, os.path
def getDirs(path):
Python convention is to use lowercase_with_underscores
for function names
sub = os.listdir(path)
Don't needlessly abbreviate, and at least have it be a plural name.
paths = {}
for p in sub:
You don't need to store things in a temporary variable to loop over them
print p
Do you really want this function printing?
pDir = '{}\{}'.format(path, p)
Use os.path.join to join paths. That'll make sure it works regardless of your platform.
if os.path.isdir(pDir):
paths.update(getAllDirs(pDir, paths))
You shouldn't both pass it and update it afterwords. That's redundant.
else:
paths[p] = pDir
return paths
def getAllDirs(mainPath, paths = {}):
Don't use mutable objects as default values, they have unexpected behavior.
subPaths = os.listdir(mainPath)
for path in subPaths:
pathDir = '{}\{}'.format(mainPath, path)
if os.path.isdir(pathDir):
paths.update(getAllDirs(pathDir, paths))
else:
paths[path] = pathDir
return paths
This whole section is repeated from the previous function. You should combine them.
Take a look at the os.walk
function. It does most of the work you're doing here and you could use to simplify your code.