The purpose of the following function is to find all non-empty directories, and the files in those non-empty directories. It recursively checks each directory on an SFTP server to see if it has any files, and if it does, adds it to a default dict using the path as the key. The function uses paramiko.SFTPClient
and stat
. I am specifically concerned about the performance; it is rather slow.
Prereqsuite information
sftp.listdir_attr
returns a list ofSFTPAttribute
s which represent either files, directories, symlinks, etc., and contain ast_mode
, which is used to determine if it is a directory or file. This can throw an IOException for example if you don't have permissions to inspect the path.stat.S_ISDIR
will inspect the mode to determine if its a directory
The function in question:
def recursive_ftp(sftp, path='.', files=None):
if files is None:
files = defaultdict(list)
# loop over list of SFTPAttributes (files with modes)
for attr in sftp.listdir_attr(path):
if stat.S_ISDIR(attr.st_mode):
# If the file is a directory, recurse it
recursive_ftp(sftp, os.path.join(path,attr.filename), files)
else:
# if the file is a file, add it to our dict
files[path].append(attr.filename)
return files
Use:
import paramiko
import stat
transport = paramiko.Transport((host, port))
transport.connect(username=username, password=password)
sftp = paramiko.SFTPClient.from_transport(transport)
files = recursive_ftp(sftp)
If we have an SFTP server that looks like this:
/foo
----a.csv
----b.csv
/bar
----c.csv
/baz
The function will return a dictionary like so:
{
'./foo': ['a.csv', 'b.csv'],
'./bar': ['c.csv']
}
2 Answers 2
There is nothing obviously wrong with your implementation that could explain a slow behaviour. The slowest part here being the use of listdir_attr
, you might want to check with other means if its speed matches what your network has to offer.
That being said, there are a few changes you can do to improve a bit on your end:
- use a helper function so
files
will not be both a return value and modified in place; - use
paramiko
simulation of a working directory to remove the need foros.path
; - use list-comprehension to remove the need for
defaultdict
.
I'm also wondering whether you really want to list everything that is not a directory or only regular files (i.e. no symlinks, no block devices, etc.) You can change the proposed list-comprehension accordingly.
Proposed improvements
def _sftp_helper(sftp, files):
stats = sftp.listdir_attr('.')
files[sftp.getcwd()] = [attr.filename for attr in stats if stat.S_ISREG(attr.st_mode)]
for attr in stats:
if stat.S_ISDIR(attr.st_mode): # If the file is a directory, recurse it
sftp.chdir(attr.filename)
_sftp_helper(sftp, files)
sftp.chdir('..')
def filelist_recursive(sftp):
files = {}
_sftp_helper(sftp, files)
return files
You can adapt easily to include back the optional path
parameter into filelist_recursive
.
-
1\$\begingroup\$ I believe the OP used a defaultdict so that empty directories would not be in
files
. If I understand correctly, yours would still include them. \$\endgroup\$zondo– zondo2016年05月01日 11:27:48 +00:00Commented May 1, 2016 at 11:27 -
\$\begingroup\$ @zondo Didn't think of that. To me primary usage of defaultdict is to use
files[path].append
instead offiles.setdefault(path, []).append
. In the end I do think it's a usefull feature to list the names of all directories, even if they are empty. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2016年05月01日 11:34:59 +00:00Commented May 1, 2016 at 11:34
You miss imports for your function (I believe it is somewhere in your code, but not in the posted snippet):
from collections import defaultdict
import os
Creation of files
variable inside of the recursive function looks a bit weird from my non-Python point of view, but it has its advantages as you optional arguments can be skipped (scan different directory, append results to existing list of files), which seems to be a good design.
I can criticize the function naming. recursive_ftp
does not explain what does the function do. It is recursive (but it does not add much value) and it is not ftp
. I might think about list_sftp
?
Explore related questions
See similar questions with these tags.