Checking that 2 pdf are identical (md5 a solution?)

Peter Otten __peter__ at web.de
Sat Jul 24 11:50:31 EDT 2010


rlevesque wrote:
> Hi
>> I am working on a program that generates various pdf files in the /
> results folder.
>> "scenario1.pdf" results from scenario1
> "scenario2.pdf" results from scenario2
> etc
>> Once I am happy with scenario1.pdf and scenario2.pdf files, I would
> like to save them in the /check folder.
>> Now after having developed/modified the program to produce
> scenario3.pdf, I would like to be able to re-generate
> files
> /results/scenario1.pdf
> /results/scenario2.pdf
>> and compare them with
> /check/scenario1.pdf
> /check/scenario2.pdf
>> I tried using the md5 module to compare these files but md5 reports
> differences even though the code has *not* changed at all.
>> Is there a way to compare 2 pdf files generated at different time but
> identical in every other respect and validate by program that the
> files are identical (for all practical purposes)?

Here's a naive approach, but it may be good enough for your purpose.
I've printed the same small text into 1.pdf and 2.pdf
(Bad practice warning: this session is slightly doctored; I hope I haven't 
introduced an error)
>>> a = open("1.pdf").read()
>>> b = open("2.pdf").read()
>>> diff = [i for i, (x, y) in enumerate(zip(a, c)) if x != y]
>>> len(diff)
2
>>> diff
[160, 161]
>>> a[150:170]
'0100724151412)\n>>\nen'
>>> a[140:170]
'nDate (D:20100724151412)\n>>\nen'
>>> a[130:170]
')\n/CreationDate (D:20100724151412)\n>>\nen'
OK, let's ignore "lines" starting with "/CreationDate " for our custom 
comparison function:
>>> def equal_pdf(fa, fb):
... with open(fa) as a:
... with open(fb) as b:
... for la, lb in izip_longest(a, b, fillvalue=""):
... if la != lb:
... if not la.startswith("/CreationDate 
"): return False
... if not lb.startswith("/CreationDate 
"): return False
... return True
...
>>> from itertools import izip_longest
>>> equal_pdf("1.pdf", "2.pdf")
True
Peter


More information about the Python-list mailing list

AltStyle によって変換されたページ (->オリジナル) /