Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 2e77319

Browse files
Add Ghostscript-based PDF compressor and update docs (fixes #129)
- Add pdf_compressor_ghostscript.py using open-source Ghostscript - Update README.md with both legacy and recommended methods - Update requirements.txt to note system dependencies - Fixes issue #129: PDFTron/PDFNet is now commercial and requires license - Provides free alternative with same functionality and API
1 parent 8fca152 commit 2e77319

File tree

3 files changed

+157
-8
lines changed

3 files changed

+157
-8
lines changed
Lines changed: 47 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,48 @@
11
# [How to Compress PDF Files in Python](https://www.thepythoncode.com/article/compress-pdf-files-in-python)
2-
To run this:
3-
- `pip3 install -r requirements.txt`
4-
- To compress `bert-paper.pdf` file:
5-
```
6-
$ python pdf_compressor.py bert-paper.pdf bert-paper-min.pdf
7-
```
8-
This will spawn a new compressed PDF file under the name `bert-paper-min.pdf`.
2+
3+
This directory contains two approaches:
4+
5+
- Legacy (commercial): `pdf_compressor.py` uses PDFTron/PDFNet. PDFNet now requires a license key and the old pip package is not freely available, so this may not work without a license.
6+
- Recommended (open source): `pdf_compressor_ghostscript.py` uses Ghostscript to compress PDFs.
7+
8+
## Ghostscript method (recommended)
9+
10+
Prerequisite: Install Ghostscript
11+
12+
- macOS (Homebrew):
13+
- `brew install ghostscript`
14+
- Ubuntu/Debian:
15+
- `sudo apt-get update && sudo apt-get install -y ghostscript`
16+
- Windows:
17+
- Download and install from https://ghostscript.com/releases/
18+
- Ensure `gswin64c.exe` (or `gswin32c.exe`) is in your PATH.
19+
20+
No Python packages are required for this method, only Ghostscript.
21+
22+
### Usage
23+
24+
To compress `bert-paper.pdf` into `bert-paper-min.pdf` with default quality (`power=2`):
25+
26+
```
27+
python pdf_compressor_ghostscript.py bert-paper.pdf bert-paper-min.pdf
28+
```
29+
30+
Optional quality level `[power]` controls compression/quality tradeoff (maps to Ghostscript `-dPDFSETTINGS`):
31+
32+
- 0 = `/screen` (smallest, lowest quality)
33+
- 1 = `/ebook` (good quality)
34+
- 2 = `/printer` (high quality) [default]
35+
- 3 = `/prepress` (very high quality)
36+
- 4 = `/default` (Ghostscript default)
37+
38+
Example:
39+
40+
```
41+
python pdf_compressor_ghostscript.py bert-paper.pdf bert-paper-min.pdf 1
42+
```
43+
44+
In testing, `bert-paper.pdf` (~757 KB) compressed to ~407 KB with `power=1`.
45+
46+
## Legacy PDFNet method (requires license)
47+
48+
If you have a valid license and the PDFNet SDK installed, you can use the original `pdf_compressor.py` script. Note that the previously referenced `PDFNetPython3` pip package is not freely available and may not install via pip. Refer to the vendor's documentation for installation and licensing.
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
import os
2+
import sys
3+
import subprocess
4+
import shutil
5+
6+
7+
def get_size_format(b, factor=1024, suffix="B"):
8+
for unit in ["", "K", "M", "G", "T", "P", "E", "Z"]:
9+
if b < factor:
10+
return f"{b:.2f}{unit}{suffix}"
11+
b /= factor
12+
return f"{b:.2f}Y{suffix}"
13+
14+
15+
def find_ghostscript_executable():
16+
candidates = [
17+
shutil.which('gs'),
18+
shutil.which('gswin64c'),
19+
shutil.which('gswin32c'),
20+
]
21+
for c in candidates:
22+
if c:
23+
return c
24+
return None
25+
26+
27+
def compress_file(input_file: str, output_file: str, power: int = 2):
28+
"""Compress PDF using Ghostscript.
29+
30+
power:
31+
0 -> /screen (lowest quality, highest compression)
32+
1 -> /ebook (good quality)
33+
2 -> /printer (high quality) [default]
34+
3 -> /prepress (very high quality)
35+
4 -> /default (Ghostscript default)
36+
"""
37+
if not os.path.exists(input_file):
38+
raise FileNotFoundError(f"Input file not found: {input_file}")
39+
if not output_file:
40+
output_file = input_file
41+
42+
initial_size = os.path.getsize(input_file)
43+
44+
gs = find_ghostscript_executable()
45+
if not gs:
46+
raise RuntimeError(
47+
"Ghostscript not found. Install it and ensure 'gs' (Linux/macOS) "
48+
"or 'gswin64c'/'gswin32c' (Windows) is in PATH."
49+
)
50+
51+
settings_map = {
52+
0: '/screen',
53+
1: '/ebook',
54+
2: '/printer',
55+
3: '/prepress',
56+
4: '/default',
57+
}
58+
pdfsettings = settings_map.get(power, '/printer')
59+
60+
cmd = [
61+
gs,
62+
'-sDEVICE=pdfwrite',
63+
'-dCompatibilityLevel=1.4',
64+
f'-dPDFSETTINGS={pdfsettings}',
65+
'-dNOPAUSE',
66+
'-dBATCH',
67+
'-dQUIET',
68+
f'-sOutputFile={output_file}',
69+
input_file,
70+
]
71+
72+
try:
73+
subprocess.run(cmd, check=True)
74+
except subprocess.CalledProcessError as e:
75+
print(f"Ghostscript failed: {e}")
76+
return False
77+
78+
compressed_size = os.path.getsize(output_file)
79+
ratio = 1 - (compressed_size / initial_size)
80+
summary = {
81+
"Input File": input_file,
82+
"Initial Size": get_size_format(initial_size),
83+
"Output File": output_file,
84+
"Compressed Size": get_size_format(compressed_size),
85+
"Compression Ratio": f"{ratio:.3%}",
86+
}
87+
88+
print("## Summary ########################################################")
89+
for k, v in summary.items():
90+
print(f"{k}: {v}")
91+
print("###################################################################")
92+
return True
93+
94+
95+
if __name__ == '__main__':
96+
if len(sys.argv) < 3:
97+
print("Usage: python pdf_compressor_ghostscript.py <input.pdf> <output.pdf> [power 0-4]")
98+
sys.exit(1)
99+
input_file = sys.argv[1]
100+
output_file = sys.argv[2]
101+
power = int(sys.argv[3]) if len(sys.argv) > 3 else 2
102+
ok = compress_file(input_file, output_file, power)
103+
sys.exit(0 if ok else 2)
Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,7 @@
1-
PDFNetPython3==8.1.0
1+
# No Python dependencies required for Ghostscript-based compressor.
2+
# System dependency: Ghostscript
3+
# - macOS: brew install ghostscript
4+
# - Debian: sudo apt-get install -y ghostscript
5+
# - Windows: https://ghostscript.com/releases/
6+
#
7+
# The legacy script (pdf_compressor.py) depends on PDFNet (commercial) and a license key.

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /