Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit dc641a9

Browse files
Merge pull request avinashkranjan#1144 from zankrut20/pdf-watermark-remover
PDF Watermark Remover
2 parents 71fa076 + 263aaa3 commit dc641a9

File tree

5 files changed

+102
-0
lines changed

5 files changed

+102
-0
lines changed
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
from skimage import io
2+
from PyPDF2 import PdfFileReader
3+
from pdf2image import convert_from_path
4+
import numpy as np
5+
import os
6+
from PIL import Image
7+
from fpdf import FPDF
8+
import shutil
9+
10+
pdfFile = input('PDF file location: ')
11+
dirname = os.path.dirname(os.path.normpath(pdfFile))
12+
outputFile = os.path.basename(pdfFile)
13+
outputFile = os.path.splitext(outputFile)[0]
14+
pdf_reader = PdfFileReader(pdfFile)
15+
pages = pdf_reader.getNumPages()
16+
rang = int(pages) + 1
17+
18+
# Select the pixel from the extracted images of pdf pages
19+
def select_pixel(r,g,b):
20+
if r > 120 and r < 254 and g > 120 and g < 254 and b > 120 and b < 254:
21+
return True
22+
else:
23+
return False
24+
25+
# Handling of images for removing the watermark
26+
def handle(imgs):
27+
for i in range(imgs.shape[0]):
28+
for j in range(imgs.shape[1]):
29+
if select_pixel(imgs[i][j][0],imgs[i][j][1],imgs[i][j][2]):
30+
imgs[i][j][0] = imgs[i][j][1] = imgs[i][j][2] = 255
31+
return imgs
32+
33+
images = convert_from_path(pdfFile)
34+
35+
try:
36+
os.mkdir(dirname + '\img')
37+
except FileExistsError:
38+
print('Folder exist')
39+
index = 0
40+
for img in images:
41+
index += 1
42+
img = np.array(img)
43+
print(img.shape)
44+
img = handle(img)
45+
io.imsave(dirname + '\img\img' + str(index) + '.jpg', img)
46+
print(index)
47+
48+
# Merging images to a sigle PDF
49+
pdf = FPDF()
50+
sdir = dirname + "img/"
51+
w,h = 0,0
52+
53+
for i in range(1, rang):
54+
fname = sdir + "img%.0d.jpg" % i
55+
if os.path.exists(fname):
56+
if i == 1:
57+
cover = Image.open(fname)
58+
w,h = cover.size
59+
pdf = FPDF(unit = "pt", format = [w,h])
60+
image = fname
61+
pdf.add_page()
62+
pdf.image(image, 0, 0, w, h)
63+
else:
64+
print("File not found:", fname)
65+
# print("processed %d" % i)
66+
pdf.output(dirname + outputFile + '_rw.pdf', "F")
67+
print("done")

‎PDF Watermark Remover/img1.jpg

480 KB
Loading[フレーム]

‎PDF Watermark Remover/readme.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# PDF-Watermark-Remover
2+
3+
The script PDF-Watermark-Remover will remove the watermark from the PDF files. The watermark is selected based on ranges of RGB color and based on selected pixels it will remove the watermark from the pages.
4+
5+
## Setup instructions
6+
7+
1. Download the python script file ("PDF-Watermark-Remover.py")
8+
2. Open the cmd/powershell/shell and type below line & hit return
9+
- For windows
10+
```
11+
python <path of "PDF-Watermark-Remover.py">
12+
```
13+
- For mac/linux
14+
```
15+
python3 <path of "PDF-Watermark-Remover.py">
16+
```
17+
18+
## Output
19+
20+
| Original Page | Output Page |
21+
| --- | --- |
22+
| <image src= "sample001.png" width = 2000px> | <image src= "img1.jpg" width = 2000px> |
23+
24+
## Author(s)
25+
[Zankrut Goyani](https://github.com/zankrut20)
26+
27+
## Disclaimers, if any
28+
Script will not removed base color watermarks like Red (255, 0, 0), Green (0, 255, 0) and Blue (0, 0, 255)

‎PDF Watermark Remover/requirements.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
scikit_image==0.18.1
2+
pdf2image==1.15.1
3+
fpdf==1.7.2
4+
numpy==1.19.5
5+
Pillow==8.2.0
6+
PyPDF2==1.26.0
7+
skimage==0.0

‎PDF Watermark Remover/sample001.png

220 KB
Loading[フレーム]

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /