8
\$\begingroup\$

I have recently been working through a number of the Cryptopals Cryptography Challenges to try and improve my knowledge and understanding of both cryptography and Python. As five of the first six challenges are XOR related problems I thought it would be a good idea to compile my work into a single program capable of encrypting, decrypting, and cracking using the XOR cipher. The program is capable of both single-byte and multi-byte encryption modes and can employ statistical analysis to guess a key when none is given.

I have previously asked for reviews on my Ceasar and Vigenere implementations/crackers and have included all of them together as a small suite for these fun little ciphers. I won't include all the code here but if possible I would like to know what improvements I could make to the overall structure of the project as I am trying to learn about how to organise a project like this with the intention of growing it with more and more encryption tools in the future.

What I Would Like Feedback On

  • Correctness of my implementations. Please let me know if there are any errors in my code.
  • Readability, Pythonic-ness, style, and documentation. I am learning Python with the intention of working with large teams on projects in the future and I feel these will be important for collaborative work.
  • There is an issue with the method predictKeySize() in XORAnalysis.py in which it will have a strong bias towards short keys if it is allowed to guess short keys. As such it is currently hard coded to only guess lengths greater than 6 meaning my program is incapable of cracking keys between two and five characters long. Any ideas about how to improve this would be much appreciated
  • Performance improvements and memory usage reductions. Not as important as other areas as the program isn't particularly slow or resource intensive but still nice to know about.

The Code

xor.py

#!/usr/bin/python3
"""
 Filename: xor.py
 Date: 15/07/20
 Licence: GNU GPL V3
 
 Multipurpose XOR Encryption tool, can encrypt and decrypt text using a specified single-byte or multi-byte key or attempt to decrypt an input without a given key by using statistical analysis
 Options:
 --encrypt Enable encryption mode (Default)
 --decrypt Enable decryption mode
 --key Specify the encryption key
 --guess Attempt to guess the encryption key by statistical analysis
 --single-byte Enable single-byte XOR mode (Default)
 --multi-byte Enable multi-byte XOR mode
"""
import argparse
import string
import codecs
import sys
from itertools import cycle
from internal.XORAnalysis import predictKeySize, multiByteXORCrack, multiByteXOR, repeatingByteXOR, repeatingByteXORCrack
def initialiseParser():
 parser = argparse.ArgumentParser(description = "Encrypt, decrypt, or crack a message using the XOR Cipher")
 parser.add_argument("--key", "-k", help = "The encryption key to be used (if relevant)", type = str)
 parser.add_argument("--guess", "-g", help = "Perform statistical analysis to estimate the most likely value of the encryption key", action = "store_true")
 parser.add_argument("--single-byte", "--single", "-s", help = "Enable single-byte key mode", action = "store_true")
 parser.add_argument("--multi-byte", "--multi", "-m", help = "Enable multi-byte key mode", action = "store_true")
 parser.add_argument("--decrypt", "-d", help = "Enable decryption mode", action = "store_true")
 return parser
def main():
 parser = initialiseParser()
 args = parser.parse_args()
 inputString = sys.stdin.read().encode()
 if args.decrypt or args.guess:
 inputString = codecs.decode(inputString, "base-64")
 if args.guess:
 if args.multi_byte:
 print("[+] Selecting multi-byte key mode...", file = sys.stderr)
 print("[+] Predicting key length...", file = sys.stderr) # At this point we have the entire decoded input in memory, all that is left is to crack it
 keyLength = predictKeySize(inputString)
 print("[-] Got length of {}...\n[+] Attempting to crack key...".format(keyLength), file = sys.stderr)
 crack = multiByteXORCrack(inputString, keyLength)
 key = crack['key']
 else:
 print("[+] Selecting single-byte key mode...", file = sys.stderr)
 print("[+] Attempting to crack key...", file = sys.stderr)
 crack = repeatingByteXORCrack(inputString)
 key = chr(crack['key'])
 print("[-] Got key: \"{}\" !\n[+] Decrypting message...".format(key), file = sys.stderr)
 output = crack['message']
 elif args.key != None:
 if len(args.key) > 1 and not args.multi_byte:
 print("[+] Single-byte mode selected but multi-byte key was given. Defaulting to multi-byte mode...", file = sys.stderr)
 args.multi_byte = True
 output = multiByteXOR(inputString, [ord(c) for c in args.key]) if args.multi_byte else repeatingByteXOR(inputString, ord(args.key))
 
 else:
 print("[-] Error: No key given!", file = sys.stderr)
 return
 if not args.decrypt and not args.guess:
 output = codecs.encode(output.encode(), "base-64").decode()
 print(output, end = "")
if __name__ == "__main__":
 main()

XORAnalysis.py

"""
 Filename: XORAnalysis.py
 Date: 19/06/20
 Licence: GNU GPL V3
 
 A collection of analysis functions and pieces of information required byciphertools programs which implement XOR-based algorithms
 
"""
from itertools import cycle
import string
from .Strings import alphanumeric_characters, buildSubStrings
# XOR analysis functions
def letterRatio(inputString):
 return sum([x in alphanumeric_characters for x in inputString]) / len(inputString)
def probablyText(inputString):
 return letterRatio(inputString) > 0.7
# Functions for single-byte key XOR
def repeatingByteXOR(inputString, byte):
 return "".join(chr(c ^ byte) for c in inputString)
def repeatingByteXORCrack(inputString):
 best = None
 for byte in range(256):
 currentString = repeatingByteXOR(inputString.strip(), byte)
 num_chars = sum([x in alphanumeric_characters for x in currentString])
 if best == None or num_chars > best['num_chars']:
 best = { 'message': currentString, 'num_chars': num_chars, 'key': byte }
 return best
# Functions for multi-byte key XOR
def multiByteXORCrack(inputString, keyLength):
 key = "".join(chr(repeatingByteXORCrack(string.strip())['key']) for string in buildSubStrings(inputString, keyLength))
 message = multiByteXOR(inputString, key.encode())
 return { 'message': message, 'key': key }
def multiByteXOR(inputString, key):
 return "".join(chr(c ^ byte) for c, byte in zip(inputString, cycle(key)))
# Functions for multi-byte XOR key length prediction
def XORStrings(first, second):
 return bytes([i ^ j for i, j in zip(first, second)]) # Convert two byte strings to their xor product
def hammingDistance(first, second):
 return bin(int.from_bytes(XORStrings(first, second), "little")).count("1") # Calculate the bit difference between two strings
def predictKeySize(inputString):
 bestKey = 0
 bestDistance = 10000
 for i in range(6, 40): # Set to a lower bound of 6 because otherwise it always guesses a really short key. Will try and fix in later version.
 distance = 0
 blocks = len(inputString) // i - 1
 for x in range(blocks):
 distance += hammingDistance(inputString[i * x:i * (x + 2) - 1], inputString[i * (x + 2):i * (x + 4) - 1])
 distance /= i
 distance /= blocks
 if distance < bestDistance:
 bestDistance = distance
 bestKey = i
 return bestKey

Strings.py

"""
 Filename: strings.py
 Date: 28/09/19
 Licence: GNU GPL V3
 
 
 A collection of functions for the modification of strings required by multiple programs in the ciphertools suite
"""
import re
alphanumeric_characters = "1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz "
english = { 'monogram-frequencies': [8.167, 1.492, 2.782, 4.253, 12.702, 2.228, 2.015, 6.094, 6.966, 0.153, 0.772, 4.025, 2.406, 6.749, 7.507, 1.929, 0.095, 5.987, 6.327, 9.056, 2.758, 0.978, 2.360, 0.150, 1.974, 0.074 ],
 'bigram-frequencies': [] }
def stringPrepare(string, preserveSpacing): # Strip all non alphabetic characters from a string and convert to upper case
 return re.compile("[^A-Z\s]" if preserveSpacing else "[^A-Z]").sub("", string.upper())
def buildSubStrings(string, separation): # Build a list of substrings required to analyse the ciphertext
 return [string[i::separation] for i in range(separation)]
asked Jul 15, 2020 at 13:07
\$\endgroup\$

1 Answer 1

6
\$\begingroup\$

Nomenclature

By PEP8, initialiseParser should be initialise_parser, and similarly for inputString, etc.

String interpolation

print("[-] Got length of {}...\n[+] Attempting to crack key...".format(keyLength), file = sys.stderr)

is simpler as

print(
 f"[-] Got length of {key_length}...\n"
 "Attempting to crack key...",
 file=sys.stderr,
)

Type hints

For example,

def probablyText(inputString):

can be

def probably_text(input_string: str) -> bool:

Sum without a comprehension

sum([x in alphanumeric_characters for x in currentString])

should use the generator directly instead of making a list; i.e.

sum(x in alphanumeric_characters for x in current_string)

The same goes for

return bytes([i ^ j for i, j in zip(first, second)]) # Convert two byte strings to their xor product

Strongly-typed results

best = { 'message': currentString, 'num_chars': num_chars, 'key': byte }

If you're only doing this because you need to return multiple things, idiomatic Python is to simply return them as a tuple, i.e.

best = current_string, num_chars, byte
# ...
return best

But this would be better-represented by a named tuple, or (better) a @dataclass with type hints. Just not a dictionary.

Combined division

 distance /= i
 distance /= blocks

can be

distance /= i * blocks

Sums rather than successive addition

 for x in range(blocks):
 distance += hammingDistance(inputString[i * x:i * (x + 2) - 1], inputString[i * (x + 2):i * (x + 4) - 1])

can be

distance = sum(
 hamming_distance(
 input_string[i*x : i*(x+2)-1],
 input_string[i*(x+2) : i*(x+4)-1],
 )
 for x in range(blocks)
)

Drop dictionaries to variables

Given your current code,

english = { 'monogram-frequencies': [8.167, 1.492, 2.782, 4.253, 12.702, 2.228, 2.015, 6.094, 6.966, 0.153, 0.772, 4.025, 2.406, 6.749, 7.507, 1.929, 0.095, 5.987, 6.327, 9.056, 2.758, 0.978, 2.360, 0.150, 1.974, 0.074 ],
 'bigram-frequencies': [] }

should simply be a monogram variable and a bigram variable.

answered Jul 15, 2020 at 15:08
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.