For the Matasano CryptoPals Challenges Set 1 Problem 8, which states:
In this file are a bunch of hex-encoded ciphertexts.
One of them has been encrypted with ECB.
Detect it.
Remember that the problem with ECB is that it is stateless and deterministic; the same 16 byte plaintext block will always produce the same 16 byte ciphertext.
I have written the following Python Code. Since I'm a programmer coming from C++ to Python, some of it might be reminiscent of that transfer. Can I be more Pythonic than the code I've written? Any more suggestions?
#! /usr/bin/env python
from binascii import a2b_hex
def is_ECB_encoded(data, block_size):
block_count = len(data)/block_size
for i in range(block_count):
for j in range(i+1,block_count):
if data[i*block_size:(i+1)*block_size] == data[j*block_size:(j+1)*block_size]:
return True
return False
filename = '8.txt'
for line in open(filename):
line = line.strip()
if is_ECB_encoded(a2b_hex(line), 16):
print line
2 Answers 2
Assuming your algorithm is correct:
- Follow the pep8 style guide.
- You should use
range(0, len(data), block_size)
so you don't have to do all the multiplication over and over. - You can use
itertools.combinations
to reduce it to only one loop. - You don't close the file. You should use
with
to open and close the file. - You can use
zip
(or for python2itertools.izip
) to get both the first and second index at the same time. - You can simplify the whole thing using
any
and a generator expressions. - You can use
str.join
to simplify the printing.
Since iterators and generator expressions are lazy evaluated, it won't create any intermediate arrays. This makes it very fast but also very readable.
from binascii import a2b_hex
from itertools import combinations, izip
def is_ecb_encoded(data, block_size):
myrange = range(0, len(data), block_size)
inds = combinations(zip(myrange, myrange[1:])), 2)
return any(data[i1:i2] == data[j1:j2] for (i1, i2), (j1, j2) in inds)
filename = '8.txt'
with open(filename) as fobj:
lines = (line.rstrip() for line in fobj)
lines = (a2b_hex(line) for line in lines)
lines = (line for line in lines if is_ecb_encoded(line, 16))
print '\n'.join(lines)
-
\$\begingroup\$ Thanks! Many valid points that I had not thought about. BTW, shouldn't it be
combinations(zip(myrange, myrange[1:])), 2)
\$\endgroup\$Jay Bosamiya– Jay Bosamiya2015年07月17日 09:03:58 +00:00Commented Jul 17, 2015 at 9:03 -
\$\begingroup\$ Sorry, yes it should. Fixed \$\endgroup\$TheBlackCat– TheBlackCat2015年07月17日 09:04:55 +00:00Commented Jul 17, 2015 at 9:04
Standard indentation, specified in PEP 8, is 4 spaces. For a language where indentation matters, following the convention is important.
Your algorithm is inefficient, in that it compares every block with all subsequent blocks. The running time is therefore O(n2), where n is the number of blocks.
A better strategy would be to sort the blocks first, then compare just the neighboring blocks — that's O(n log n).
An even better strategy would be to use some kind of hash. collections.Counter
is one way to do that:
from collections import Counter
def is_ecb_encoded(ciphertext, block_bytes):
block_starts = range(0, len(ciphertext), block_bytes)
block_counts = Counter(ciphertext[i : i+block_bytes] for i in block_starts)
most_common = block_counts.most_common(1)
return most_common and most_common[0][1] > 1
You could also write something more manual:
def is_ecb_encoded(ciphertext, block_bytes):
blocks = set()
for block_start in range(0, len(ciphertext), block_bytes):
block = ciphertext[block_start : block_start+block_bytes]
if block in blocks:
return True
blocks.add(block)
return False
Both of those solutions should be O(n).
-
\$\begingroup\$ Thanks! I had favoured readability for efficiency for this piece of code, but seems like even after improving efficiency, it remains quite readable (my way would not have been so readable with this efficiency, due to my C++ roots). BTW, which of the two suggestions you've given would you suggest, since their readability and efficiency seems to be the same? \$\endgroup\$Jay Bosamiya– Jay Bosamiya2015年07月17日 09:09:24 +00:00Commented Jul 17, 2015 at 9:09
-
1\$\begingroup\$ BTW, edited answer for correcting the return values in the manual approach. \$\endgroup\$Jay Bosamiya– Jay Bosamiya2015年07月17日 09:10:27 +00:00Commented Jul 17, 2015 at 9:10
Explore related questions
See similar questions with these tags.