BadCafe as a git object short hash

Question 1

as I was reading the Git Internals in the Pro Git book, I decided to see what objects I need to create to have the short hash say something funny. To do this, I have written the following code:

messages = ['badcafe','deaddad','badbeef','feedbac']
possible_letters = string.ascii_lowercase + string.ascii_uppercase + ' '
def getHash(data):
 return hashlib.sha1(str("blob " + str(len(data)) + "0円" + data).encode('utf-8')).hexdigest()
def populate(n):
 for data in product(possible_letters, repeat=n):
 s = getHash(''.join(data))[0:7]
 if s in messages:
 print(s,data)

Unfortunately, this is too slow for n> 4. For smaller n, it only finds badcafe. Is there any way to speed this up?

Question 2

That's a brute force search of hashes for a particular prefix, meaning it's complexity is \$O(2^n)\,ドル where \$n\$ is the number of bits in the prefix. In your case that's \2ドル^{28}\$.

You can't make it faster in terms of complexity or the hash would be broken, but you can increase the constant factor.

Using cProfile I see that out of 30 seconds of runtime the worst offenders are:

8042221 10.763 0.000 24.243 0.000 tmp.py:10(getHash)
 5 5.138 1.028 30.701 6.140 tmp.py:13(populate)
8042222 3.805 0.000 3.805 0.000 {_hashlib.openssl_sha1}
8042221 5.143 0.000 5.143 0.000 {method 'encode' of 'str' objects}
8042221 3.876 0.000 3.876 0.000 {method 'hexdigest' of '_hashlib.HASH' objects}
8042221 1.320 0.000 1.320 0.000 {method 'join' of 'str' objects}

The actual hashing is done in hexdigest and OpenSSL. Those pretty much cannot be reduced, so at best it could be made ~3 times as fast.

In python 2 you can remove the encode invocation. That alone saves a few seconds. A set is faster for in testing, so that's also an easy fix. Take advantage of hashlib.sha1's copy and update by noticing the length stays constant and you are at 2x the speed:

def getHash(data, h):
 h = h.copy() # "fork" the hash
 h.update(data)
 return h.hexdigest()
def populate(n):
 h = hashlib.sha1("blob " + str(n) + "0円") # shared part of the hash
 for data in product(possible_letters, repeat=n):
 s = getHash(''.join(data), h)[0:7]
 if s in messages:
 print(s,data)

On Python 3 you can encode the letters into bytes and avoid encode inside the loop, for similar performance.

You might still make it about 1.5x as fast if you replaced the product -> join algorithm with something faster (bytearray(n) and depth first search?), but that's about it.

otus otus 8887 silver badges17 bronze badges · Answer 1 · 2014-06-07 18:12:09Z

That's a brute force search of hashes for a particular prefix, meaning it's complexity is \$O(2^n)\,ドル where \$n\$ is the number of bits in the prefix. In your case that's \2ドル^{28}\$.

You can't make it faster in terms of complexity or the hash would be broken, but you can increase the constant factor.

Using cProfile I see that out of 30 seconds of runtime the worst offenders are:

8042221 10.763 0.000 24.243 0.000 tmp.py:10(getHash)
 5 5.138 1.028 30.701 6.140 tmp.py:13(populate)
8042222 3.805 0.000 3.805 0.000 {_hashlib.openssl_sha1}
8042221 5.143 0.000 5.143 0.000 {method 'encode' of 'str' objects}
8042221 3.876 0.000 3.876 0.000 {method 'hexdigest' of '_hashlib.HASH' objects}
8042221 1.320 0.000 1.320 0.000 {method 'join' of 'str' objects}

The actual hashing is done in hexdigest and OpenSSL. Those pretty much cannot be reduced, so at best it could be made ~3 times as fast.

In python 2 you can remove the encode invocation. That alone saves a few seconds. A set is faster for in testing, so that's also an easy fix. Take advantage of hashlib.sha1's copy and update by noticing the length stays constant and you are at 2x the speed:

def getHash(data, h):
 h = h.copy() # "fork" the hash
 h.update(data)
 return h.hexdigest()
def populate(n):
 h = hashlib.sha1("blob " + str(n) + "0円") # shared part of the hash
 for data in product(possible_letters, repeat=n):
 s = getHash(''.join(data), h)[0:7]
 if s in messages:
 print(s,data)

On Python 3 you can encode the letters into bytes and avoid encode inside the loop, for similar performance.

You might still make it about 1.5x as fast if you replaced the product -> join algorithm with something faster (bytearray(n) and depth first search?), but that's about it.

Stack Exchange Network

BadCafe as a git object short hash

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

BadCafe as a git object short hash

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions