Matasano CryptoPals Problem 8: Detecting ECB encryption

Question 1

For the Matasano CryptoPals Challenges Set 1 Problem 8, which states:

In this file are a bunch of hex-encoded ciphertexts.

One of them has been encrypted with ECB.

Detect it.

Remember that the problem with ECB is that it is stateless and deterministic; the same 16 byte plaintext block will always produce the same 16 byte ciphertext.

I have written the following Python Code. Since I'm a programmer coming from C++ to Python, some of it might be reminiscent of that transfer. Can I be more Pythonic than the code I've written? Any more suggestions?

#! /usr/bin/env python
from binascii import a2b_hex
def is_ECB_encoded(data, block_size):
 block_count = len(data)/block_size
 for i in range(block_count):
 for j in range(i+1,block_count):
 if data[i*block_size:(i+1)*block_size] == data[j*block_size:(j+1)*block_size]:
 return True
 return False
filename = '8.txt'
for line in open(filename):
 line = line.strip()
 if is_ECB_encoded(a2b_hex(line), 16):
 print line

Question 2

Assuming your algorithm is correct:

Follow the pep8 style guide.
You should use range(0, len(data), block_size) so you don't have to do all the multiplication over and over.
You can use itertools.combinations to reduce it to only one loop.
You don't close the file. You should use with to open and close the file.
You can use zip (or for python2 itertools.izip) to get both the first and second index at the same time.
You can simplify the whole thing using any and a generator expressions.
You can use str.join to simplify the printing.

Since iterators and generator expressions are lazy evaluated, it won't create any intermediate arrays. This makes it very fast but also very readable.

from binascii import a2b_hex
from itertools import combinations, izip
def is_ecb_encoded(data, block_size):
 myrange = range(0, len(data), block_size)
 inds = combinations(zip(myrange, myrange[1:])), 2)
 return any(data[i1:i2] == data[j1:j2] for (i1, i2), (j1, j2) in inds)
filename = '8.txt'
with open(filename) as fobj:
 lines = (line.rstrip() for line in fobj)
 lines = (a2b_hex(line) for line in lines)
 lines = (line for line in lines if is_ecb_encoded(line, 16))
 print '\n'.join(lines)

Question 3

Thanks! Many valid points that I had not thought about. BTW, shouldn't it be combinations(zip(myrange, myrange[1:])), 2)

Question 4

Sorry, yes it should. Fixed

Question 5

Standard indentation, specified in PEP 8, is 4 spaces. For a language where indentation matters, following the convention is important.

Your algorithm is inefficient, in that it compares every block with all subsequent blocks. The running time is therefore O(n²), where n is the number of blocks.

A better strategy would be to sort the blocks first, then compare just the neighboring blocks — that's O(n log n).

An even better strategy would be to use some kind of hash. collections.Counter is one way to do that:

from collections import Counter
def is_ecb_encoded(ciphertext, block_bytes):
 block_starts = range(0, len(ciphertext), block_bytes)
 block_counts = Counter(ciphertext[i : i+block_bytes] for i in block_starts)
 most_common = block_counts.most_common(1)
 return most_common and most_common[0][1] > 1

You could also write something more manual:

def is_ecb_encoded(ciphertext, block_bytes):
 blocks = set()
 for block_start in range(0, len(ciphertext), block_bytes):
 block = ciphertext[block_start : block_start+block_bytes]
 if block in blocks:
 return True
 blocks.add(block)
 return False

Both of those solutions should be O(n).

Question 6

Thanks! I had favoured readability for efficiency for this piece of code, but seems like even after improving efficiency, it remains quite readable (my way would not have been so readable with this efficiency, due to my C++ roots). BTW, which of the two suggestions you've given would you suggest, since their readability and efficiency seems to be the same?

Question 7

BTW, edited answer for correcting the return values in the manual approach.

TheBlackCat TheBlackCat 2,44412 silver badges10 bronze badges · Answer 1 · 2015-07-17 08:22:16Z

Assuming your algorithm is correct:

Follow the pep8 style guide.
You should use range(0, len(data), block_size) so you don't have to do all the multiplication over and over.
You can use itertools.combinations to reduce it to only one loop.
You don't close the file. You should use with to open and close the file.
You can use zip (or for python2 itertools.izip) to get both the first and second index at the same time.
You can simplify the whole thing using any and a generator expressions.
You can use str.join to simplify the printing.

Since iterators and generator expressions are lazy evaluated, it won't create any intermediate arrays. This makes it very fast but also very readable.

from binascii import a2b_hex
from itertools import combinations, izip
def is_ecb_encoded(data, block_size):
 myrange = range(0, len(data), block_size)
 inds = combinations(zip(myrange, myrange[1:])), 2)
 return any(data[i1:i2] == data[j1:j2] for (i1, i2), (j1, j2) in inds)
filename = '8.txt'
with open(filename) as fobj:
 lines = (line.rstrip() for line in fobj)
 lines = (a2b_hex(line) for line in lines)
 lines = (line for line in lines if is_ecb_encoded(line, 16))
 print '\n'.join(lines)

Thanks! Many valid points that I had not thought about. BTW, shouldn't it be combinations(zip(myrange, myrange[1:])), 2)

200_success 200_success 146k22 gold badges190 silver badges479 bronze badges · Answer 2 · 2015-07-17 08:34:32Z

Standard indentation, specified in PEP 8, is 4 spaces. For a language where indentation matters, following the convention is important.

Your algorithm is inefficient, in that it compares every block with all subsequent blocks. The running time is therefore O(n²), where n is the number of blocks.

A better strategy would be to sort the blocks first, then compare just the neighboring blocks — that's O(n log n).

An even better strategy would be to use some kind of hash. collections.Counter is one way to do that:

from collections import Counter
def is_ecb_encoded(ciphertext, block_bytes):
 block_starts = range(0, len(ciphertext), block_bytes)
 block_counts = Counter(ciphertext[i : i+block_bytes] for i in block_starts)
 most_common = block_counts.most_common(1)
 return most_common and most_common[0][1] > 1

You could also write something more manual:

def is_ecb_encoded(ciphertext, block_bytes):
 blocks = set()
 for block_start in range(0, len(ciphertext), block_bytes):
 block = ciphertext[block_start : block_start+block_bytes]
 if block in blocks:
 return True
 blocks.add(block)
 return False

Both of those solutions should be O(n).

Thanks! I had favoured readability for efficiency for this piece of code, but seems like even after improving efficiency, it remains quite readable (my way would not have been so readable with this efficiency, due to my C++ roots). BTW, which of the two suggestions you've given would you suggest, since their readability and efficiency seems to be the same?
BTW, edited answer for correcting the return values in the manual approach.

Stack Exchange Network

Matasano CryptoPals Problem 8: Detecting ECB encryption

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Matasano CryptoPals Problem 8: Detecting ECB encryption

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions