This is my solution to the Chapter 9 exercise in Automate the Boring Stuff:
Selective Copy
Write a program that walks through a folder tree and searches for files with a certain file extension (such as
.jpg
). Copy these files from whatever location they are in to a new folder.
#!/usr/local/bin/python3
#selectiveCopy.py - Walks through a folder tree with the extension .PDF and copies them to a new folder.
import re
import os
import shutil
filePdf = re.compile(r"""^(.*?) #text before extension
[.][p][d][f] #extension .pdf
""", re.VERBOSE)
def selectiveCopy(fileExtension):
for pdfFiles in os.listdir("/Users//Desktop/newPDFs"):
locatedFiles = filePdf.search(pdfFiles)
if locatedFiles != None:
print(locatedFiles)
files_to_copy = os.path.join("/Users/my_user_name/Desktop/newPDFs", pdfFiles) `
shutil.copy(files_to_copy, "/Users/my_user_name/Desktop/oldPDFs")
print("The following file has been copied " + files_to_copy)
selectiveCopy(filePdf)
I appreciate for probably all of you on here that it's not a very difficult exercise, but I've been working through this book slowly and have just decided it's probably a good idea to start putting them on a Github page. I wanted to know if I could improve the code to make it more efficient or if there are some things I've included that aren't necessarily considered "standard practice".
-
1\$\begingroup\$ Hey, welcome to Code Review! The problem description says to traverse a directory tree. This sounds to me like you actually need to descend down into sub-directories. \$\endgroup\$Graipher– Graipher2018年08月17日 12:50:22 +00:00Commented Aug 17, 2018 at 12:50
2 Answers 2
If you want to recurse also into sub-folders, you should use os.walk
:
import os
import shutil
def get_files_recursively(start_directory, filter_extension=None):
for root, _, files in os.walk(start_directory):
for file in files:
if filter_extension is None or file.lower().endswith(filter_extension):
yield os.path.join(root, file)
def selective_copy(source, target, file_extension=None):
for file in get_files_recursively(source, file_extension):
print(file)
shutil.copy(file, target)
print("The following file has been copied", file)
if __name__ == "__main__":
selective_copy("/Users/my_user_name/Desktop/newPDFs",
"/Users/my_user_name/Desktop/oldPDFs",
".pdf")
As stated in the documentation, this is actually faster than os.listdir
since Python 3.5:
This function now calls
os.scandir()
instead ofos.listdir()
, making it faster by reducing the number of calls toos.stat()
.
I also
- added a
if __name__ == "__main__":
guard to allow importing this module from another script without running the function - pulled the constants out into the calling code to make it re-usable
- used
is
instead of!=
to compare toNone
- used
str.endswith
instead of a regular expression to avoid some overhead - changed the names to adhere to Python's official style-guide, PEP8.
-
\$\begingroup\$ I don't understand everything you did to be honest, but I'll look up some of the code, which will hopefully help me write a bit better. Thanks. \$\endgroup\$Elarbe– Elarbe2018年08月17日 13:25:37 +00:00Commented Aug 17, 2018 at 13:25
-
-
2\$\begingroup\$ A typo here:
if filter is None
\$\endgroup\$hjpotter92– hjpotter922018年08月17日 18:58:05 +00:00Commented Aug 17, 2018 at 18:58
Review
Your regex can be simplified:
There is no need for the multiple
[]
brackets - this would suffice:r"^(.*?)\.pdf$"
.Note that I escape the
.
char (which will match anything) with a backslash\.
to only match the specific.
char, and the$
to be certain that .pdf is at the end of the string.There is no need for Regex at all!
You can either use the
glob
module to directly find all files with an extension Python3.5+def get_files(source, extension): for filename in glob.iglob(f'{source}/**/*{extension}', recursive=True): yield filename
use something like
.split(".")
orfilename.endswith('.extension')
to find if the file uses that extension. As @Graipher showed in his answer.
Currently your solution works with hardcoded directories.
To change the to directory to read or to write in, you'd need to change a lot in your code. Your solution would be much better if you could send these path locations as parameters.
-
\$\begingroup\$ I tried using the regex r"^(.*?)\.pdf" initially and it was matching a lot of other files that weren't PDFs, but I could have made an error or might have changed some code after that. Your last point what do you mean? sorry I'm still new so I don't understand. \$\endgroup\$Elarbe– Elarbe2018年08月17日 13:23:24 +00:00Commented Aug 17, 2018 at 13:23
-
2\$\begingroup\$ I added a
$
to the regex so it wont match files like "afile.pdfsonething.txt" \$\endgroup\$Ludisposed– Ludisposed2018年08月17日 13:25:54 +00:00Commented Aug 17, 2018 at 13:25
Explore related questions
See similar questions with these tags.