Commit dc641a9

authored

Merge pull request avinashkranjan#1144 from zankrut20/pdf-watermark-remover

PDF Watermark Remover

2 parents 71fa076 + 263aaa3 commit dc641a9Copy full SHA for dc641a9

File tree

5 files changed

+102

-0

lines changed

PDF Watermark Remover

5 files changed

+102

-0

lines changed

`‎PDF Watermark Remover/PDF-Watermark-Remover.py`

Lines changed: 67 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,67 @@`
	`1`	`+from skimage import io`
	`2`	`+from PyPDF2 import PdfFileReader`
	`3`	`+from pdf2image import convert_from_path`
	`4`	`+import numpy as np`
	`5`	`+import os`
	`6`	`+from PIL import Image`
	`7`	`+from fpdf import FPDF`
	`8`	`+import shutil`
	`9`	`+`
	`10`	`+pdfFile = input('PDF file location: ')`
	`11`	`+dirname = os.path.dirname(os.path.normpath(pdfFile))`
	`12`	`+outputFile = os.path.basename(pdfFile)`
	`13`	`+outputFile = os.path.splitext(outputFile)[0]`
	`14`	`+pdf_reader = PdfFileReader(pdfFile)`
	`15`	`+pages = pdf_reader.getNumPages()`
	`16`	`+rang = int(pages) + 1`
	`17`	`+`
	`18`	`+# Select the pixel from the extracted images of pdf pages`
	`19`	`+def select_pixel(r,g,b):`
	`20`	`+ if r > 120 and r < 254 and g > 120 and g < 254 and b > 120 and b < 254:`
	`21`	`+ return True`
	`22`	`+ else:`
	`23`	`+ return False`
	`24`	`+`
	`25`	`+# Handling of images for removing the watermark`
	`26`	`+def handle(imgs):`
	`27`	`+ for i in range(imgs.shape[0]):`
	`28`	`+ for j in range(imgs.shape[1]):`
	`29`	`+ if select_pixel(imgs[i][j][0],imgs[i][j][1],imgs[i][j][2]):`
	`30`	`+ imgs[i][j][0] = imgs[i][j][1] = imgs[i][j][2] = 255`
	`31`	`+ return imgs`
	`32`	`+`
	`33`	`+images = convert_from_path(pdfFile)`
	`34`	`+`
	`35`	`+try:`
	`36`	`+ os.mkdir(dirname + '\img')`
	`37`	`+except FileExistsError:`
	`38`	`+ print('Folder exist')`
	`39`	`+index = 0`
	`40`	`+for img in images:`
	`41`	`+ index += 1`
	`42`	`+ img = np.array(img)`
	`43`	`+ print(img.shape)`
	`44`	`+ img = handle(img)`
	`45`	`+ io.imsave(dirname + '\img\img' + str(index) + '.jpg', img)`
	`46`	`+ print(index)`
	`47`	`+`
	`48`	`+# Merging images to a sigle PDF`
	`49`	`+pdf = FPDF()`
	`50`	`+sdir = dirname + "img/"`
	`51`	`+w,h = 0,0`
	`52`	`+`
	`53`	`+for i in range(1, rang):`
	`54`	`+ fname = sdir + "img%.0d.jpg" % i`
	`55`	`+ if os.path.exists(fname):`
	`56`	`+ if i == 1:`
	`57`	`+ cover = Image.open(fname)`
	`58`	`+ w,h = cover.size`
	`59`	`+ pdf = FPDF(unit = "pt", format = [w,h])`
	`60`	`+ image = fname`
	`61`	`+ pdf.add_page()`
	`62`	`+ pdf.image(image, 0, 0, w, h)`
	`63`	`+ else:`
	`64`	`+ print("File not found:", fname)`
	`65`	`+ # print("processed %d" % i)`
	`66`	`+pdf.output(dirname + outputFile + '_rw.pdf', "F")`
	`67`	`+print("done")`

`‎PDF Watermark Remover/img1.jpg`

480 KB

Loading[フレーム]

`‎PDF Watermark Remover/readme.md`

Lines changed: 28 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,28 @@`
	`1`	`+# PDF-Watermark-Remover`
	`2`	`+`
	`3`	`+The script PDF-Watermark-Remover will remove the watermark from the PDF files. The watermark is selected based on ranges of RGB color and based on selected pixels it will remove the watermark from the pages.`
	`4`	`+`
	`5`	`+## Setup instructions`
	`6`	`+`
	`7`	`+1. Download the python script file ("PDF-Watermark-Remover.py")`
	`8`	`+2. Open the cmd/powershell/shell and type below line & hit return`
	`9`	`+ - For windows`
	`10`	+ ```
	`11`	`+ python <path of "PDF-Watermark-Remover.py">`
	`12`	+ ```
	`13`	`+ - For mac/linux`
	`14`	+ ```
	`15`	`+ python3 <path of "PDF-Watermark-Remover.py">`
	`16`	+ ```
	`17`	`+`
	`18`	`+## Output`
	`19`	`+`
	`20`	`+\| Original Page \| Output Page \|`
	`21`	`+\| --- \| --- \|`
	`22`	`+\| <image src= "sample001.png" width = 2000px> \| <image src= "img1.jpg" width = 2000px> \|`
	`23`	`+`
	`24`	`+## Author(s)`
	`25`	`+[Zankrut Goyani](https://github.com/zankrut20)`
	`26`	`+`
	`27`	`+## Disclaimers, if any`
	`28`	`+Script will not removed base color watermarks like Red (255, 0, 0), Green (0, 255, 0) and Blue (0, 0, 255)`

`‎PDF Watermark Remover/requirements.txt`

Lines changed: 7 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,7 @@`
	`1`	`+scikit_image==0.18.1`
	`2`	`+pdf2image==1.15.1`
	`3`	`+fpdf==1.7.2`
	`4`	`+numpy==1.19.5`
	`5`	`+Pillow==8.2.0`
	`6`	`+PyPDF2==1.26.0`
	`7`	`+skimage==0.0`

`‎PDF Watermark Remover/sample001.png`

220 KB

Loading[フレーム]

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit dc641a9

File tree

5 files changed

5 files changed

`‎PDF Watermark Remover/PDF-Watermark-Remover.py`

`‎PDF Watermark Remover/img1.jpg`

`‎PDF Watermark Remover/readme.md`

`‎PDF Watermark Remover/requirements.txt`

`‎PDF Watermark Remover/sample001.png`

0 commit comments