4
\$\begingroup\$

For the Matasano CryptoPals Challenges Set 1 Problem 8, which states:

In this file are a bunch of hex-encoded ciphertexts.

One of them has been encrypted with ECB.

Detect it.

Remember that the problem with ECB is that it is stateless and deterministic; the same 16 byte plaintext block will always produce the same 16 byte ciphertext.

I have written the following Python Code. Since I'm a programmer coming from C++ to Python, some of it might be reminiscent of that transfer. Can I be more Pythonic than the code I've written? Any more suggestions?

#! /usr/bin/env python
from binascii import a2b_hex
def is_ECB_encoded(data, block_size):
 block_count = len(data)/block_size
 for i in range(block_count):
 for j in range(i+1,block_count):
 if data[i*block_size:(i+1)*block_size] == data[j*block_size:(j+1)*block_size]:
 return True
 return False
filename = '8.txt'
for line in open(filename):
 line = line.strip()
 if is_ECB_encoded(a2b_hex(line), 16):
 print line
200_success
146k22 gold badges190 silver badges479 bronze badges
asked Jul 17, 2015 at 7:41
\$\endgroup\$

2 Answers 2

2
\$\begingroup\$

Assuming your algorithm is correct:

  1. Follow the pep8 style guide.
  2. You should use range(0, len(data), block_size) so you don't have to do all the multiplication over and over.
  3. You can use itertools.combinations to reduce it to only one loop.
  4. You don't close the file. You should use with to open and close the file.
  5. You can use zip (or for python2 itertools.izip) to get both the first and second index at the same time.
  6. You can simplify the whole thing using any and a generator expressions.
  7. You can use str.join to simplify the printing.

Since iterators and generator expressions are lazy evaluated, it won't create any intermediate arrays. This makes it very fast but also very readable.

from binascii import a2b_hex
from itertools import combinations, izip
def is_ecb_encoded(data, block_size):
 myrange = range(0, len(data), block_size)
 inds = combinations(zip(myrange, myrange[1:])), 2)
 return any(data[i1:i2] == data[j1:j2] for (i1, i2), (j1, j2) in inds)
filename = '8.txt'
with open(filename) as fobj:
 lines = (line.rstrip() for line in fobj)
 lines = (a2b_hex(line) for line in lines)
 lines = (line for line in lines if is_ecb_encoded(line, 16))
 print '\n'.join(lines)
answered Jul 17, 2015 at 8:22
\$\endgroup\$
2
  • \$\begingroup\$ Thanks! Many valid points that I had not thought about. BTW, shouldn't it be combinations(zip(myrange, myrange[1:])), 2) \$\endgroup\$ Commented Jul 17, 2015 at 9:03
  • \$\begingroup\$ Sorry, yes it should. Fixed \$\endgroup\$ Commented Jul 17, 2015 at 9:04
2
\$\begingroup\$

Standard indentation, specified in PEP 8, is 4 spaces. For a language where indentation matters, following the convention is important.

Your algorithm is inefficient, in that it compares every block with all subsequent blocks. The running time is therefore O(n2), where n is the number of blocks.

A better strategy would be to sort the blocks first, then compare just the neighboring blocks — that's O(n log n).

An even better strategy would be to use some kind of hash. collections.Counter is one way to do that:

from collections import Counter
def is_ecb_encoded(ciphertext, block_bytes):
 block_starts = range(0, len(ciphertext), block_bytes)
 block_counts = Counter(ciphertext[i : i+block_bytes] for i in block_starts)
 most_common = block_counts.most_common(1)
 return most_common and most_common[0][1] > 1

You could also write something more manual:

def is_ecb_encoded(ciphertext, block_bytes):
 blocks = set()
 for block_start in range(0, len(ciphertext), block_bytes):
 block = ciphertext[block_start : block_start+block_bytes]
 if block in blocks:
 return True
 blocks.add(block)
 return False

Both of those solutions should be O(n).

answered Jul 17, 2015 at 8:34
\$\endgroup\$
2
  • \$\begingroup\$ Thanks! I had favoured readability for efficiency for this piece of code, but seems like even after improving efficiency, it remains quite readable (my way would not have been so readable with this efficiency, due to my C++ roots). BTW, which of the two suggestions you've given would you suggest, since their readability and efficiency seems to be the same? \$\endgroup\$ Commented Jul 17, 2015 at 9:09
  • 1
    \$\begingroup\$ BTW, edited answer for correcting the return values in the manual approach. \$\endgroup\$ Commented Jul 17, 2015 at 9:10

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.