I forgot the password to my pdf. I remembered a few characters, so I made a simple password-cracker program in Python. Is the document lost now given the number of combinations? Out of curiosity, how could I speed up this program as much as possible?
from pikepdf import open
from itertools import product
from math import factorial
c = '12A3ca9€'
c = sorted(set(c))
length = len(c)
total = ( (1-(length**(length+1)))/(1-length) ) - 1
s = 0
for s2 in range(length):
s2 += 1
m = product(c, repeat=s2)
for i in m:
try:
with open(r"C:\Users\User\Desktop\Document.pdf", password=''.join(i)) as pdf:
print(len(pdf.pages))
print(''.join(i))
exit(0)
except:
pass
s+=1
print(100*s/total)
3 Answers 3
Your code will never actually test any password, since password
is not a valid argument for open()
. Thusly every call to it, will throw a TypeError
, which is swallowed by your bare except
clause. Read up on proper PDF libraries that support decryption by password and don't ever use bare except statements.
-
\$\begingroup\$ No, you are wrong. I get an error "pdf = Pdf._open( pikepdf._qpdf.PasswordError: C:\Users\User\Desktop\Document.pdf: invalid password". I also tried locking a new pdf file. I used the same program to unlock it. I got no exception, I could open the new pdf from my code perfectly.
password
is a valid argument. Notice the import statement at the begining, We are not talking about the sameopen()
. Anyways maybe its bad practice to import functions with the same name as built-ins? \$\endgroup\$gabriel– gabriel2022年10月18日 16:10:12 +00:00Commented Oct 18, 2022 at 16:10 -
\$\begingroup\$ Oh! actually 1 line of code is missing! My bad, I didnt write
from pikepdf import open
but it's in my code. Apologies. \$\endgroup\$gabriel– gabriel2022年10月18日 16:14:27 +00:00Commented Oct 18, 2022 at 16:14 -
\$\begingroup\$ Please don't change your code, once an answer has been posted. See #5: codereview.stackexchange.com/tour \$\endgroup\$Richard Neumann– Richard Neumann2022年10月18日 19:04:10 +00:00Commented Oct 18, 2022 at 19:04
-
\$\begingroup\$ I didnt change the code, If you see the edits, the line was there from the begining \$\endgroup\$gabriel– gabriel2022年10月18日 19:15:58 +00:00Commented Oct 18, 2022 at 19:15
-
1\$\begingroup\$ I think that says something about using names that are the same as built-in functions. Perhaps
from pikepdf import open as pdf_open
would have saved the code from silently changing meaning? \$\endgroup\$Toby Speight– Toby Speight2022年10月19日 08:34:40 +00:00Commented Oct 19, 2022 at 8:34
Rewrite it in C.
On a serious note, performance of your algorithm can be increased but not by a lot, since the bottleneck is definitely trying to open the file over and over.
[Edit: this is wrong in a weird way; please refer to @RichardNeumann's answer!]
Remove increments and printing of s
: it runs on each iteration and doesn't contribute to anything.
You do s2 += 1
even though s2
is already being increased by the for
loop. This is very misleading.
[Edited. Thanks to @SylvainD!]
If you think that every symbol appears only once, you can reduce the number of possibilities significantly.
If this doesn't work, I've read that this tool has a speed of 100K attempts per second at cracking PDF passwords.
P.S. Please name your variables according to what they actually represent, even thought it's just a small script. I had a hard time reading this.
-
\$\begingroup\$ Hello, regarding the
s2 += 1
, it is very misleading but no wrong as such. The s2 value would actually go through all values. I'd suggest tryingfor i in range(4): i+=1 print(i)
. Your comment assumes that the behavior is the one from a standard C loop but it doesn't quite work that way. \$\endgroup\$SylvainD– SylvainD2022年10月18日 07:01:07 +00:00Commented Oct 18, 2022 at 7:01 -
\$\begingroup\$ There's no bottleneck on opening the files since the call to open will immediately fail with a TypeError due to the unsupported kwarg password. So the code fails lightning fast. :D \$\endgroup\$Richard Neumann– Richard Neumann2022年10月18日 07:50:50 +00:00Commented Oct 18, 2022 at 7:50
-
\$\begingroup\$ @RichardNeumann wow, I just assumed it has that parameter. \$\endgroup\$QuasiStellar– QuasiStellar2022年10月18日 08:02:51 +00:00Commented Oct 18, 2022 at 8:02
-
1\$\begingroup\$ @SylvainD ooh that's right, since it uses local copies of i. \$\endgroup\$QuasiStellar– QuasiStellar2022年10月18日 08:05:46 +00:00Commented Oct 18, 2022 at 8:05
-
\$\begingroup\$ @RichardNeumann Yes, one line of code wasn't present in the code view, I changed it. There was an import statement. \$\endgroup\$gabriel– gabriel2022年10月18日 16:18:43 +00:00Commented Oct 18, 2022 at 16:18
Review of the Python code
Various details about the Python code itself
You code looks good and uses properly various nice features of Python: data types like set and modules like itertools.
- The
m
variable is not that useful - The auto-increment of
s2
is slightly misleading. Here are a 2 alternatives: directly uses2+1
to getproduct(c, repeat=s2+1)
or gets2
by iterating over a different range:for s2 in range(1, length+1)
- Most variables name convey no actual meaning:
c
is a list (maybelst
orchars
),s
is a counter (maybec
ornb
),s2
is a length... - The way
total
is computed probably deserves some explanation math.factorial
is not used anymore''.join(i)
could be computed just once, before thetry
r"C:\Users\User\Desktop\Document.pdf"
could probably be in a constant- Bare
except
are usually frowned upon as they usually catch more than expected and also provide no meaning about what is being caught to the reader. See What is wrong with using a bareexcept
? for more details.
Comments about the behavior of the program
If I wanted to use such a tool, here are a few aspects of its behavior that I would most probably change.
- Calling
print
at each iteration will take some time and not give much information to the user. An alternative could be to prints2
instead: "About to try passwords of lengths X - Y combinations to be tested" - Stopping at strings of length
sorted(set(c))
seems pretty arbitrary. My suggestion would be to keep going withitertools.count
instead ofrange
.
len(c)
which seems pretty arbitrary. \$\endgroup\$