I just wrote a little Python 3 program to spit out every possible combination of a set of 99 characters. It gets the job done, but I would be very interested in what you think of it.
I am just a few days into Python, so I would be grateful for even seemingly obvious advice.
import sys
# List of 99 characters and a blank string:
lib=["","0","1","2","3","4","5","6","7","8","9","A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z","a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","°","!","\"","§","$","%","&","/","(",")","=","ß"," ́","`","+","#","-",".",",",">","<","@","€","|","^","~","–","{","[","]","}","Ä","Ö","Ü","ä","ö","ü"]
# 10 counters for up to 10-digit-combinations:
counter0=-1
counter1=0
counter2=0
counter3=0
counter4=0
counter5=0
counter6=0
counter7=0
counter8=0
counter9=0
# Repetitive if-statements adding to the counters:
for i in range(sys.maxsize**99999):
counter0+=1
if counter0>99:
counter0=counter0*0
counter1+=1
elif counter1>99:
counter1=counter1*0
counter2+=1
elif counter2>99:
counter2=counter2*0
counter3+=1
elif counter3>99:
counter3=counter3*0
counter4+=1
elif counter4>99:
counter4=counter4*0
counter5+=1
elif counter5>99:
counter5=counter5*0
counter6+=1
elif counter6>99:
counter6=counter6*0
counter7+=1
elif counter7>99:
counter7=counter7*0
counter8+=1
elif counter8>99:
counter8=counter8*0
counter9+=1
elif counter9>99:
print("DONE.")
# Printing the translation from counters to character - and deleting the former output so it stays in one line:
else:
print(lib[counter0]+lib[counter1]+lib[counter2]+lib[counter3]+lib[counter4]+lib[counter5]+lib[counter6]+lib[counter7]+lib[counter8]+lib[counter9], end="\r")
sys.stdout.write("\b"*10+" "*10+"\b"*10)
3 Answers 3
We can convert a plain string to a list rather than maintain a list for the characters.
It is far easier to modify and read the following then a list.
lib = [''] + list('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz°!"§$%&/()=ß ́`+#-.,><@€|^~–{[]}ÄÖÜäöü')
When we have
counter0
,counter1
, ...,countern
it's a hint that we should use a list.counters = [-1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
We can then use
counters[0]
as a drop in replacement forcounter0
.Now that we have
counters
which is a list we can simplify your print from the following.print(lib[counters[0]] + lib[counters[1]] + lib[counters[2]] + lib[counters[3]] + > lib[counters[4]] + lib[counters[5]] + lib[counters[6]] + lib[counters[7]] + lib[counters[8]] + lib[counters[9]], end="\r")
We can use a for loop to loop through counters, index
lib
and print the character. We'll useend=""
to get the same format that you have. Since we've changed from"\r"
to""
we'll need to print that afterwards.for counter in counters: print(lib[counter], end="") print(end="\r")
It would be better to use
len(lib)
rather than hard coding99
in your ifs. If we change the content oflib
it's much easier to edit justlib
rather thanlib
and 10 99's.Rather than
counter0=counter0*0
it would make more sense to remove the multiplication and just set the value to 0.counter0 = 0
It's convention in Python to put a space either side of operators. This means
a+b
should instead bea + b
. This is as it makes it easier to see what is and is not an operator and either side of an operator.It's convention in Python to use
_
as a 'throw away' variable. This means it's normal to use_
rather thani
in your for loop.
Together this gets:
import sys
lib = [''] + list('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz°!"§$%&/()=ß ́`+#-.,><@€|^~–{[]}ÄÖÜäöü')
counters = [-1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
for _ in range(sys.maxsize**99999):
counters[0] += 1
if counters[0] >= len(lib):
counters[0] = 0
counters[1] += 1
elif counters[1] >= len(lib):
counters[1] = 0
counters[2] += 1
elif counters[2] >= len(lib):
counters[2] = 0
counters[3] += 1
elif counters[3] >= len(lib):
counters[3] = 0
counters[4] += 1
elif counters[4] >= len(lib):
counters[4] = 0
counters[5] += 1
elif counters[5] >= len(lib):
counters[5] = 0
counters[6] += 1
elif counters[6] >= len(lib):
counters[6] = 0
counters[7] += 1
elif counters[7] >= len(lib):
counters[7] = 0
counters[8] += 1
elif counters[8] >= len(lib):
counters[8] = 0
counters[9] += 1
elif counters[9] >= len(lib):
print("DONE.")
else:
for counter in counters:
print(lib[counter], end="")
print(end="\r")
sys.stdout.write("\b"*10 + " "*10 + "\b"*10)
There are still some changes that we can make to your code to make it easier to work with. These are a bit more advanced so don't worry if you don't get them straight away.
We can change your big
if
elif
block into a singlefor
loop.
Lets see what we have so far:if counters[0] > len(lib): counters[0] = 0 counters[1] += 1
We know that this section is repeated each time for each index. So we can make this generic by changing
0
toindex
and1
toindex + 1
.if counters[index] >= len(lib): counters[index] = 0 counters[index + 1] += 1
Now we just need to loop over
range(len(counters) - 1)
to duplicate the block 9 times.for index in range(len(counters) - 1): if counters[index] >= len(lib): counters[index] = 0 counters[index + 1] += 1
We can use some Python sugar to make your print for loop 'cleaner'. Firstly we can remove all the
print
s by building a list.combination = [] for counter in counters: combination.append(lib[counter])
From here we can join all the strings together with
"".join
and pass it toprint
like you did before. This will join the list by empty strings so it converts it's like manually doingcombination[0] + combination[1] + ...
.print("".join(combination), end="\r")
We can then use a list comprehension to build
combination
in one line. This is just syntaxtic sugar and is the same as the for loop we used before. It's just different, cleaner, syntax.combination = [lib[counter] for counter in counters]
We can use either a
while True
loop oritertools.count
rather thanrange(sys.maxsize**99999)
to loop infinitely.while True: counters[0] += 1
import itertools for _ in range(itertools.count()): counters[0] += 1
We can probably just use
print
rather thansys.stdout.write
.To make it so there is no newline we can pass
end=""
. This however doesn't flush the stream straight away, and so we need to passflush=True
.
lib = [''] + list('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz°!"§$%&/()=ß ́`+#-.,><@€|^~–{[]}ÄÖÜäöü')
counters = [-1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
while True:
counters[0] += 1
for index in range(len(counters) - 1):
if counters[index] >= len(lib):
counters[index] = 0
counters[index + 1] += 1
if counters[9] >= len(lib):
print("DONE.")
else:
print("".join([lib[counter] for counter in counters]), end="\r")
print("\b"*10 + " "*10 + "\b"*10, end="", flush=True)
-
1\$\begingroup\$ Very nice answer! One thing I'd add, is that this kind of function seems to be much more useful as a generator. Maybe move the code into a function and a main where the desired amount of data is printed from the generator. \$\endgroup\$jaaq– jaaq2020年10月05日 07:17:48 +00:00Commented Oct 5, 2020 at 7:17
-
2\$\begingroup\$ @jaaq Thank you! I would normally agree with you 110% however I feel bringing generator functions into the mix may cause significant confusion for the OP. I know the OP wants the code to run infinitely and figuring out two things at the some time is likely to be harder than just one. (generator functions and making it run infinitely) After the OP has figured out how to make it infinite I think it would make a good follow up question where explaining generator functions would be a reasonable stepping stone in growing the OP's programming abilities. But to be clear, generators would be good here \$\endgroup\$2020年10月05日 11:47:44 +00:00Commented Oct 5, 2020 at 11:47
-
1\$\begingroup\$ Yeah, I agree it'd be a lot at once for someone just starting out. Just wanted to throw it out there; broaden the horizon a bit more you know :) \$\endgroup\$jaaq– jaaq2020年10月05日 14:05:33 +00:00Commented Oct 5, 2020 at 14:05
-
1\$\begingroup\$ Why have a list of single character strings, when a string is already a list of characters? \$\endgroup\$Voo– Voo2020年10月05日 21:03:08 +00:00Commented Oct 5, 2020 at 21:03
-
1\$\begingroup\$ @Voo As in change
lib
to'0123...'
? It's because it won't include''
. \$\endgroup\$2020年10月05日 21:04:18 +00:00Commented Oct 5, 2020 at 21:04
It might be useful to know that python has some built-in options for performing combinatorics. In particular, I found the itertools module very handy for these kind of operations. It might be a bit advanced when still starting with Python, but in time you'll pick up many of these useful things.
For the case of bruto forcing a password, the product
method seems ideal.
For example, if you want all possible combinations with 5 digits, you can run:
from itertools import product
lib = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz°!"§$%&/()=ß ́`+#-.,><@€|^~–{[]}ÄÖÜäöü'
for combination in product(lib, repeat=5):
attempt="".join(combination) # This turns combination (a list of characters) into a string.
# Use your password attempt here
So if you want to expand this to all number of digits up to 10, you can use:
for length in range(10):
for combination in product(lib, repeat=length):
attempt="".join(combination)
# Use your password attempt here
A note on using generators
One advantage of this method is that the methods from itertools
don't store all combinations, but instead they generate them on the go. As a result, their memory usage doesn't increase with the number of combinations.
This is quite important in a scenario like yours, where the amount of possible combinations has a factorial growth. This article gives a good introduction, with some extra use-cases.
Another nice part of this is, that it's quite easy to let this code keep trying all combinations of increasing length untill something is found.
This also uses the count()
method from itertools, which is a generator that starts from a number and keeps increasing forever.
from itertools import product, count
lib = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz°!"§$%&/()=ß ́`+#-.,><@€|^~–{[]}ÄÖÜäöü'
for length in count(0):
for combination in product(lib, repeat=length):
attempt="".join(combination)
# Use your password attempt here
if password_found:
print(attempt)
break
For better usability (will come in very handy in a moment), let's change your code into a generator. That's just a function that yields values one by one when requested (or rather, technically, the generator object it returns does). So the only change is that instead of printing, you yield:
def combinations():
# List of 99 characters and a blank string:
...
else:
yield lib[counter0]+lib[counter1]+lib[counter2]+lib[counter3]+lib[counter4]+lib[counter5]+lib[counter6]+lib[counter7]+lib[counter8]+lib[counter9]
Now we can for example loop over its results and print them:
for comb in combinations():
print(comb)
Output:
0
1
2
3
4
5
6
7
8
9
A
B
...
Or we can take some to build a list:
from itertools import islice
print(list(islice(combinations(), 13)))
Output:
['', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B']
Let's watch it switch from one to two characters:
>>> list(islice(combinations(), 98, 102))
['ö', 'ü', '00', '10']
Or from two to three:
>>> list(islice(combinations(), 99*100-1, 99*100+3))
['öü', 'üü', '10', '20']
Wait what? Why is 'üü'
followed by '10'
? Shouldn't that have come a lot earlier? Did we produce that twice?
>>> list(islice(combinations(), 99*100+3)).count('10')
2
Yeah we did. Oops. So there's some bug in your code. Much harder to notice in your version, btw, with all the combinations just being printed and immediately being overwritten :-)
Anyway, I don't really want to dig deeper into that but show a simple alternative. Let's start from scratch. While we're at it, let's call it words
and make the alphabet a parameter. Start simple, only yield the words of lengths 0 and 1:
def words(alphabet):
yield ''
for letter in alphabet:
yield letter
Demo:
>>> list(words('abc'))
['', 'a', 'b', 'c']
Now how to produce the longer words? Let's see what we want:
'' '' + ''
'a' '' + 'a'
'b' '' + 'b'
'c' '' + 'c'
'aa' 'a' + 'a'
'ab' 'a' + 'b'
'ac' 'a' + 'c'
'ba' 'b' + 'a'
'bb' 'b' + 'b'
'bc' 'b' + 'c'
'ca' 'c' + 'a'
'cb' 'c' + 'b'
'cc' 'c' + 'c'
'aaa' 'aa' + 'a'
'aab' 'aa' + 'b'
'aac' 'aa' + 'c'
'aba' 'ab' + 'a'
'abb' 'ab' + 'b'
... ...
On the left is the words how we want them, and on the right I split them into prefix and last letter (if any). If we look at the last letter, we can see that it keeps cycling through the alphabet. All letters for each prefix. Let's pretend we had a prefix
function that gave us the prefixes. Then we could just write our solution like this:
def words(alphabet):
yield ''
for prefix in prefixes(alphabet):
for letter in alphabet:
yield prefix + letter
But wait. The first prefix is ''
, then 'a'
, 'b'
, 'c'
, 'aa'
, 'ab'
, etc. So the prefix just goes through the same sequence of words that we want overall. So... our words
function can use itself to produce the prefixes:
def words(alphabet):
yield ''
for prefix in words(alphabet):
for letter in alphabet:
yield prefix + letter
That's it. That's the whole solution.
Demo:
>>> list(islice(words('abc'), 20))
['', 'a', 'b', 'c', 'aa', 'ab', 'ac', 'ba', 'bb', 'bc', 'ca',
'cb', 'cc', 'aaa', 'aab', 'aac', 'aba', 'abb', 'abc', 'aca']
Finally let's try it with your alphabet again and see that switch from two to three letters:
>>> alphabet = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz°!"§$%&/()=ß ́`+#-.,><@€|^~–{[]}ÄÖÜäöü'
>>> list(islice(words(alphabet), 99*100-1, 99*100+3))
['üö', 'üü', '000', '001']
So... we ended up implementing the whole thing with a generator function that has just four simple lines, and it works with an arbitrary alphabet, and as generator it's easy to use in many ways.
It's probably also a lot faster than yours, although because of your bug, that's not easy to benchmark properly. Peilonrayz' version also has a bug at the moment, but we can compare with Ivo_Merchiers' solution and a few variations.
First ten million words using your long alphabet of 99 letters:
same first 9,999,999: True
same 10,000,000th: True {'9TT8'}
1.41 1.38 1.38 seconds Stefan_Pochmann
1.66 1.64 1.63 seconds Stefan_Pochmann_2
2.45 2.45 2.45 seconds Ivo_Merchiers
2.19 2.20 2.21 seconds Ivo_Merchiers_2
1.50 1.49 1.50 seconds Ivo_Merchiers_3
1.20 1.20 1.20 seconds Ivo_Merchiers_chain
First ten million words using alphabet abc
:
same first 9,999,999: True
same 10,000,000th: True {'abcaccbbcccacbc'}
2.49 2.43 2.48 seconds Stefan_Pochmann
4.16 4.17 4.19 seconds Stefan_Pochmann_2
3.91 3.91 3.93 seconds Ivo_Merchiers
3.64 3.66 3.64 seconds Ivo_Merchiers_2
2.74 2.74 2.75 seconds Ivo_Merchiers_3
2.45 2.46 2.45 seconds Ivo_Merchiers_chain
Full benchmark code:
from itertools import product, count, islice, chain
from timeit import repeat
from collections import deque
lib = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz°!"§$%&/()=ß ́`+#-.,><@€|^~–{[]}ÄÖÜäöü'
def Stefan_Pochmann(alphabet):
yield ''
for prefix in Stefan_Pochmann(alphabet):
for letter in alphabet:
yield prefix + letter
def Stefan_Pochmann_2(alphabet):
yield ''
for prefix in Stefan_Pochmann_2(alphabet):
yield from map(prefix.__add__, alphabet)
def Ivo_Merchiers(lib):
for length in count(0):
for combination in product(lib, repeat=length):
yield ''.join(combination)
def Ivo_Merchiers_2(lib):
join = ''.join
for length in count(0):
for combination in product(lib, repeat=length):
yield join(combination)
def Ivo_Merchiers_3(lib):
for length in count(0):
yield from map(''.join, product(lib, repeat=length))
def Ivo_Merchiers_chain(lib): # from Peilonrayz
join = ''.join
return chain.from_iterable(
map(join, product(lib, repeat=length))
for length in count(0)
)
solutions = Stefan_Pochmann, Stefan_Pochmann_2, Ivo_Merchiers, Ivo_Merchiers_2, Ivo_Merchiers_3, Ivo_Merchiers_chain
for alphabet in lib, 'abc':
print(alphabet)
n = 10**7
# Correctness
sets = map(set, zip(*(words(alphabet) for words in solutions)))
print(f'same first {n-1:,}:', all(len(s) == 1 for s in islice(sets, n - 1)))
s = next(sets)
print(f'same {n:,}th:', len(s) == 1, s)
print()
# Speed
tss = [[] for _ in solutions]
for _ in range(3):
for words, ts in zip(solutions, tss):
t = min(repeat(lambda: deque(islice(words(alphabet), n), 0), number=1))
ts.append(t)
for words, ts in zip(solutions, tss):
print(*('%.2f' % t for t in ts), 'seconds ', words.__name__, sep=' ')
print()
-
1\$\begingroup\$ @Peilonrayz Ah, I just finished the benchmark, see the update :-). Mine's fastest, as I actually expected because my "innermost loop" just adds a prefix and a letter. Faster than Ivo's
join
. I didn't include yours because you have some bug. Not sure it's the same bug you're talking about. I mean that betweenü
and00
you produce0
again. \$\endgroup\$Stefan Pochmann– Stefan Pochmann2020年10月05日 18:07:50 +00:00Commented Oct 5, 2020 at 18:07 -
1\$\begingroup\$ Interesting I'm suprised at how slow
itertools
is. I even moved a lot of Ivo's code into C and it becomes only a smidge faster than yours. Timings & code. If you want to add just your two timings here's another picture. \$\endgroup\$2020年10月05日 19:28:47 +00:00Commented Oct 5, 2020 at 19:28 -
1\$\begingroup\$ @Peilonrayz Updated. Your
chain
version is indeed the fastest. \$\endgroup\$Stefan Pochmann– Stefan Pochmann2020年10月05日 20:48:56 +00:00Commented Oct 5, 2020 at 20:48 -
\$\begingroup\$ Interesting analysis, I'm also surprised by how slow the itertools code works. Also surprising to see that chain has such a significant speed impact. \$\endgroup\$Ivo Merchiers– Ivo Merchiers2020年10月06日 06:21:59 +00:00Commented Oct 6, 2020 at 6:21
-
\$\begingroup\$ Nice answer. I wanted to propose another alternative : use 100 chars, pick any number written in base 10 and convert it to base 100, which can be done really fast. Your solution is cleaner and faster. \$\endgroup\$Eric Duminil– Eric Duminil2020年10月08日 18:44:07 +00:00Commented Oct 8, 2020 at 18:44