Find files by pattern and copy to target location

Question 1

This is my solution to the Chapter 9 exercise in Automate the Boring Stuff:

Selective Copy

Write a program that walks through a folder tree and searches for files with a certain file extension (such as .pdf or .jpg). Copy these files from whatever location they are in to a new folder.

#!/usr/local/bin/python3
#selectiveCopy.py - Walks through a folder tree with the extension .PDF and copies them to a new folder.
import re
import os
import shutil
filePdf = re.compile(r"""^(.*?) #text before extension
 [.][p][d][f] #extension .pdf
 """, re.VERBOSE)
def selectiveCopy(fileExtension):
 for pdfFiles in os.listdir("/Users//Desktop/newPDFs"):
 locatedFiles = filePdf.search(pdfFiles)
 if locatedFiles != None:
 print(locatedFiles)
 files_to_copy = os.path.join("/Users/my_user_name/Desktop/newPDFs", pdfFiles) `
 shutil.copy(files_to_copy, "/Users/my_user_name/Desktop/oldPDFs") 
 print("The following file has been copied " + files_to_copy)
selectiveCopy(filePdf)

I appreciate for probably all of you on here that it's not a very difficult exercise, but I've been working through this book slowly and have just decided it's probably a good idea to start putting them on a Github page. I wanted to know if I could improve the code to make it more efficient or if there are some things I've included that aren't necessarily considered "standard practice".

Question 2

Hey, welcome to Code Review! The problem description says to traverse a directory tree. This sounds to me like you actually need to descend down into sub-directories.

Question 3

If you want to recurse also into sub-folders, you should use os.walk:

import os
import shutil
def get_files_recursively(start_directory, filter_extension=None):
 for root, _, files in os.walk(start_directory):
 for file in files:
 if filter_extension is None or file.lower().endswith(filter_extension):
 yield os.path.join(root, file)
def selective_copy(source, target, file_extension=None):
 for file in get_files_recursively(source, file_extension):
 print(file)
 shutil.copy(file, target)
 print("The following file has been copied", file)
if __name__ == "__main__":
 selective_copy("/Users/my_user_name/Desktop/newPDFs",
 "/Users/my_user_name/Desktop/oldPDFs",
 ".pdf")

As stated in the documentation, this is actually faster than os.listdir since Python 3.5:

This function now calls os.scandir() instead of os.listdir(), making it faster by reducing the number of calls to os.stat().

I also

added a if __name__ == "__main__": guard to allow importing this module from another script without running the function
pulled the constants out into the calling code to make it re-usable
used is instead of != to compare to None
used str.endswith instead of a regular expression to avoid some overhead
changed the names to adhere to Python's official style-guide, PEP8.

Question 4

I don't understand everything you did to be honest, but I'll look up some of the code, which will hopefully help me write a bit better. Thanks.

Question 5

@LRBrady The one thing I did not link to in my answer:get_files_recursively is a generator

Question 6

A typo here: if filter is None

Question 7

Review

Your regex can be simplified:

There is no need for the multiple [] brackets - this would suffice: r"^(.*?)\.pdf$".

Note that I escape the . char (which will match anything) with a backslash \. to only match the specific . char, and the $ to be certain that .pdf is at the end of the string.
There is no need for Regex at all!
1. You can either use the glob module to directly find all files with an extension Python3.5+
```
def get_files(source, extension):
 for filename in glob.iglob(f'{source}/**/*{extension}', recursive=True):
 yield filename
```
2. use something like .split(".") or filename.endswith('.extension') to find if the file uses that extension. As @Graipher showed in his answer.
Currently your solution works with hardcoded directories.

To change the to directory to read or to write in, you'd need to change a lot in your code. Your solution would be much better if you could send these path locations as parameters.

Question 8

I tried using the regex r"^(.*?)\.pdf" initially and it was matching a lot of other files that weren't PDFs, but I could have made an error or might have changed some code after that. Your last point what do you mean? sorry I'm still new so I don't understand.

Question 9

I added a $ to the regex so it wont match files like "afile.pdfsonething.txt"

Graipher Graipher 41.6k7 gold badges70 silver badges134 bronze badges · Accepted Answer · 2018-08-17 13:09:05Z

If you want to recurse also into sub-folders, you should use os.walk:

import os
import shutil
def get_files_recursively(start_directory, filter_extension=None):
 for root, _, files in os.walk(start_directory):
 for file in files:
 if filter_extension is None or file.lower().endswith(filter_extension):
 yield os.path.join(root, file)
def selective_copy(source, target, file_extension=None):
 for file in get_files_recursively(source, file_extension):
 print(file)
 shutil.copy(file, target)
 print("The following file has been copied", file)
if __name__ == "__main__":
 selective_copy("/Users/my_user_name/Desktop/newPDFs",
 "/Users/my_user_name/Desktop/oldPDFs",
 ".pdf")

As stated in the documentation, this is actually faster than os.listdir since Python 3.5:

This function now calls os.scandir() instead of os.listdir(), making it faster by reducing the number of calls to os.stat().

I also

added a if __name__ == "__main__": guard to allow importing this module from another script without running the function
pulled the constants out into the calling code to make it re-usable
used is instead of != to compare to None
used str.endswith instead of a regular expression to avoid some overhead
changed the names to adhere to Python's official style-guide, PEP8.

I don't understand everything you did to be honest, but I'll look up some of the code, which will hopefully help me write a bit better. Thanks.
@LRBrady The one thing I did not link to in my answer:get_files_recursively is a generator

Stack Exchange Network

Find files by pattern and copy to target location

Selective Copy

2 Answers 2

Review

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Find files by pattern and copy to target location

Selective Copy

2 Answers 2

Review

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions