I sometimes need to merge some big PDF files. Looking around, I decided to use PyPDF2. All files are in the same directory as the script. The code works as intended and was tested with 5 small PDF files.
I found many variations and decided to write this one myself:
import glob
from PyPDF2 import PdfFileMerger
files = sorted(glob.glob('./*.pdf'))
merger = PdfFileMerger()
filename = input("Enter merged file name: ")
for file in files:
merger.append(file)
print(f"Processed file {file}")
merger.write(f'{filename}.pdf')
merger.close()
All files have titles such as 1.pdf
and 2.pdf
.
The writing to file bit seems sloppy to me, but I don't know how to or even if I should improve it. Did I insure that the files list will always be in alphabetical order? Are there best practice things I missed? Any other feedback is welcome as well.
The code only needs to run on Windows. Cross-platform is not a concern for me. I don't anticipate memory constraints as the system it runs on has a lot of RAM. This will only run client side using Python 3.8.
-
1\$\begingroup\$ Minor point, you don't have consistent quotes \$\endgroup\$marcellothearcane– marcellothearcane2020年09月23日 08:58:34 +00:00Commented Sep 23, 2020 at 8:58
2 Answers 2
It looks good.
Options
You could make it more flexible by prompting the user for an input directory path in addition to the output file name. You could make this an option so that you retain the original behavior (the input directory in the current directory). Consider using argparse
Error handling
If files
is empty, which can happen if there are no .pdf
input files,
the code creates an empty output .pdf
file. It would be nice to notify
the user of this situation.
I also imagine there are common scenarios where the write
call might fail.
It might be nice to catch errors there as well.
Deprecated
I realize this is 4 years after you posted the question, but when I run the
code, I get a deprecation error. I must replace PdfFileMerger
with
PdfMerger
.
Nice use of an f-string!
import
Minor nit: I find reading "glob glob" distracting,
as it makes me think of Cookie Monster turning
files into cookie crumbs.
I prefer
from glob import glob
so the target code reads more smoothly.
empty list
I agree with @toolic that the most likely failure mode is we accidentally chdir to an empty directory, or one that has text files but no PDF files. In which case we waste time prompting for output filename, and then silently "succeed" by producing a useless tiny result file.
Simplest fix would be
files = sorted(glob("*.pdf"))
assert files, f"No PDF files found in {os.getcwd()}"
Or write slightly fancier code,
using if not files: raise ValueError(...)
writable output directory
If the user happens to type in "out/result.pdf"
and there's no out/
folder, depending on your
Use Case you might prefer the current "blow up!" behavior
which gives a good diagnostic. Or you might prefer
the script silently automagically performs "mkdir out"
behind the scenes for you.
Other failure modes seem rarer, and already give a helpful diagnostic, such as the case where result.pdf exists but we lack permission to write to it, perhaps due to a Windows process holding a lock on it.
argv
Consider getting the merged filename from sys.argv
rather than input().
Or use typer
to accomplish the same thing with less effort.