PyPDF2: merging PDF files

Question 1

I sometimes need to merge some big PDF files. Looking around, I decided to use PyPDF2. All files are in the same directory as the script. The code works as intended and was tested with 5 small PDF files.

I found many variations and decided to write this one myself:

import glob
from PyPDF2 import PdfFileMerger
files = sorted(glob.glob('./*.pdf'))
merger = PdfFileMerger()
filename = input("Enter merged file name: ")
for file in files:
 merger.append(file)
 print(f"Processed file {file}")
merger.write(f'{filename}.pdf')
merger.close()

All files have titles such as 1.pdf and 2.pdf.

The writing to file bit seems sloppy to me, but I don't know how to or even if I should improve it. Did I insure that the files list will always be in alphabetical order? Are there best practice things I missed? Any other feedback is welcome as well.

The code only needs to run on Windows. Cross-platform is not a concern for me. I don't anticipate memory constraints as the system it runs on has a lot of RAM. This will only run client side using Python 3.8.

Question 2

Minor point, you don't have consistent quotes

Question 3

It looks good.

Options

You could make it more flexible by prompting the user for an input directory path in addition to the output file name. You could make this an option so that you retain the original behavior (the input directory in the current directory). Consider using argparse

Error handling

If files is empty, which can happen if there are no .pdf input files, the code creates an empty output .pdf file. It would be nice to notify the user of this situation.

I also imagine there are common scenarios where the write call might fail. It might be nice to catch errors there as well.

Deprecated

I realize this is 4 years after you posted the question, but when I run the code, I get a deprecation error. I must replace PdfFileMerger with PdfMerger.

Question 4

Nice use of an f-string!

import

Minor nit: I find reading "glob glob" distracting, as it makes me think of Cookie Monster turning files into cookie crumbs. I prefer from glob import glob so the target code reads more smoothly.

empty list

I agree with @toolic that the most likely failure mode is we accidentally chdir to an empty directory, or one that has text files but no PDF files. In which case we waste time prompting for output filename, and then silently "succeed" by producing a useless tiny result file.

Simplest fix would be

files = sorted(glob("*.pdf"))
assert files, f"No PDF files found in {os.getcwd()}"

Or write slightly fancier code, using if not files: raise ValueError(...)

writable output directory

If the user happens to type in "out/result.pdf" and there's no out/ folder, depending on your Use Case you might prefer the current "blow up!" behavior which gives a good diagnostic. Or you might prefer the script silently automagically performs "mkdir out" behind the scenes for you.

Other failure modes seem rarer, and already give a helpful diagnostic, such as the case where result.pdf exists but we lack permission to write to it, perhaps due to a Windows process holding a lock on it.

argv

Consider getting the merged filename from sys.argv rather than input(). Or use typer to accomplish the same thing with less effort.

toolic toolic 14.5k5 gold badges29 silver badges203 bronze badges · Answer 1 · 2025-02-17 12:12:31Z

It looks good.

Options

You could make it more flexible by prompting the user for an input directory path in addition to the output file name. You could make this an option so that you retain the original behavior (the input directory in the current directory). Consider using argparse

Error handling

If files is empty, which can happen if there are no .pdf input files, the code creates an empty output .pdf file. It would be nice to notify the user of this situation.

I also imagine there are common scenarios where the write call might fail. It might be nice to catch errors there as well.

Deprecated

I realize this is 4 years after you posted the question, but when I run the code, I get a deprecation error. I must replace PdfFileMerger with PdfMerger.

J_H J_H 41.4k3 gold badges38 silver badges157 bronze badges · Answer 2 · 2025-02-17 17:35:41Z

Nice use of an f-string!

import

Minor nit: I find reading "glob glob" distracting, as it makes me think of Cookie Monster turning files into cookie crumbs. I prefer from glob import glob so the target code reads more smoothly.

empty list

I agree with @toolic that the most likely failure mode is we accidentally chdir to an empty directory, or one that has text files but no PDF files. In which case we waste time prompting for output filename, and then silently "succeed" by producing a useless tiny result file.

Simplest fix would be

files = sorted(glob("*.pdf"))
assert files, f"No PDF files found in {os.getcwd()}"

Or write slightly fancier code, using if not files: raise ValueError(...)

writable output directory

If the user happens to type in "out/result.pdf" and there's no out/ folder, depending on your Use Case you might prefer the current "blow up!" behavior which gives a good diagnostic. Or you might prefer the script silently automagically performs "mkdir out" behind the scenes for you.

Other failure modes seem rarer, and already give a helpful diagnostic, such as the case where result.pdf exists but we lack permission to write to it, perhaps due to a Windows process holding a lock on it.

argv

Consider getting the merged filename from sys.argv rather than input(). Or use typer to accomplish the same thing with less effort.

Stack Exchange Network

PyPDF2: merging PDF files

2 Answers 2

Options

Error handling

Deprecated

import

empty list

writable output directory

argv

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

PyPDF2: merging PDF files

2 Answers 2

Options

Error handling

Deprecated

import

empty list

writable output directory

argv

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions