Caesar Cipher Encryption / Decryption in Python

Question 1

I'm trying to write a Python program to predict the most likely substitution for a Caesar cipher. I managed to get it working, but it just seems very clunky and slow, especially because I need to preserve cases and non-alphabetical characters.

Would it be cleaner / more pythonic to use the chr() and ord() functions instead of the ALPHABET list?
Any suggestions on how I could improve it overall (while also maintaining relative readability)?

Thank you so much!

ALPHABET = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
FREQUENCIES = [.0817, .0149, .0278, .0425, .1270, .0223, .0202, .0609, .0697, .0015, .0077, .0402, .0241, .0675, .0751, .0193, .0009, .0599, .0633, .0906, .0276, .0098, .0236, .0015, .0197, .0007]
def getGoodness(msg):
 msg_length = sum([1 for i in msg if i.isalpha()])
 msg_frqs = [round(msg.count(a) / msg_length, 4) for a in ALPHABET]
 diffs = [abs(a - b) for a, b in zip(msg_frqs, FREQUENCIES)]
 avg_diff = round(sum(diffs) / len(diffs), 4)
 return avg_diff
def shift(string, n):
 shifted = [i for i in string]
 for char in range(len(shifted)):
 if shifted[char].isalpha():
 new_char = ALPHABET[(ALPHABET.index(shifted[char].lower()) + n) % len(ALPHABET)]
 if shifted[char].isupper():
 new_char = new_char.upper()
 shifted[char] = new_char
 return ''.join(shifted)
#########################################
msg = "Oqqcfrwbu hc ozz ybckb zokg ct ojwohwcb, hvsfs wg bc kom o pss gvcizr ps opzs hc tzm. Whg kwbug ofs hcc gaozz hc ush whg toh zwhhzs pcrm ctt hvs ufcibr. Hvs pss, ct qcifgs, tzwsg obmkom psqoigs pssg rcb'h qofs kvoh viaobg hvwby wg wadcggwpzs. Mszzck, pzoqy. Mszzck, pzoqy. Mszzck, pzoqy. Mszzck, pzoqy. Ccv, pzoqy obr mszzck! Zsh'g gvoys wh id o zwhhzs. Poffm! Pfsoytogh wg fsorm!"
goodness = []
shifted = []
for i in range(26):
 goodness.append(getGoodness(shift(msg, i)))
 shifted.append(shift(msg, i))
gdict = dict(zip(goodness, shifted))
print(gdict.get(min(gdict.keys())))

Question 2

The shift() function was fairly easy to understand, in no small part because you've chosen reasonable and clear variable names. Here are a few relatively minor suggestions regarding the function. (1) Strings are iterable, so you can convert to characters directly. (2) Python makes iteration so easy that looping over data via an index is usually not needed. Instead, just iterate directly. And for those cases when an index is needed, enumerate() is often the best option, because it provides both index and value -- which helps your code by reducing its visual weight and thus aiding readability. (3) The assignment to new_char is a moderately complex expression. One way to improve code readability is to such computations apart. (4) A dict to lookup a letter's index would eliminate the need to scan ALPHABET repeatedly.

# 4
ALPHA_IDXS = {a:i for i, a in enumerate(ALPHABET)}
def shift(string, n):
 # 1
 shifted = list(string)
 # 2
 for i, char in enumerate(shifted):
 if char.isalpha():
 # 3 and 4
 j = (ALPHA_IDXS[char.lower()] + n) % len(ALPHABET)
 new_char = ALPHABET[j]
 if char.isupper():
 new_char = new_char.upper()
 shifted[i] = new_char # The only place we use index.
 return ''.join(shifted)

Another technique to consider is converting shift() into a generator function. This simplifies the code even more because the function no longer has to assemble a return data collection -- although at a small cost to the caller, which must do the assembling. Here's what that might look like.

def shift(string, n):
 for i, char in enumerate(string):
 if char.isalpha():
 j = (ALPHA_IDXS[char.lower()] + n) % len(ALPHABET)
 new = ALPHABET[j]
 char = new if char.islower() else new.upper()
 yield char

You did a nice job in getGoodness() with keeping the steps of the algorithm very clear. Here are a few suggestions: (1) Because bool is a subclass of int, you can treat True and False as integers -- just add up the Trues. Also, functions like sum() take any iterable, so you don't need an intermediate list. (2) msg_length seems a bit misleading: it is not the length of msg; it is the number of alpha characters. (3) A Counter can compute letter frequencies for you. (4) Shouldn't this calculation be done on a lowercase version of the text?

from collections import Counter
def getGoodness(msg):
 # 1 and 2
 n_alphas = sum(i.isalpha() for i in msg)
 # 3 and 4
 freqs = Counter(msg.lower())
 msg_frqs = [round(freqs[a] / n_alphas, 4) for a in ALPHABET]
 diffs = [abs(a - b) for a, b in zip(msg_frqs, FREQUENCIES)]
 avg_diff = round(sum(diffs) / len(diffs), 4)
 return avg_diff

Finally, some comments about the orchestration code. (1) Put your code in functions. (2) Set yourself up for testing. It helps a lot during the development and debugging lifecycle. (3) In the loop to compute goodnesses for the shifted strings, a tuple seems like a handier data structure than a dict. It simplifies the code and is better suited for min(). (4) I recommend the habit of setting up all scripts so they can take arguments -- again, useful during development and sometimes usage.

# 1
def main(args):
 # 2
 TESTS = (
 (
 "Oqqcfrwbu hc ozz ybckb zokg ct ojwohwcb, hvsfs wg bc kom o pss gvcizr ps opzs hc tzm. Whg kwbug ofs hcc gaozz hc ush whg toh zwhhzs pcrm ctt hvs ufcibr. Hvs pss, ct qcifgs, tzwsg obmkom psqoigs pssg rcb'h qofs kvoh viaobg hvwby wg wadcggwpzs. Mszzck, pzoqy. Mszzck, pzoqy. Mszzck, pzoqy. Mszzck, pzoqy. Ccv, pzoqy obr mszzck! Zsh'g gvoys wh id o zwhhzs. Poffm! Pfsoytogh wg fsorm!",
 "According to all known laws of aviation, there is no way a bee should be able to fly. Its wings are too small to get its fat little body off the ground. The bee, of course, flies anyway because bees don't care what humans think is impossible. Yellow, black. Yellow, black. Yellow, black. Yellow, black. Ooh, black and yellow! Let's shake it up a little. Barry! Breakfast is ready!",
 ),
 (
 "Ebiil, tloia!",
 "Hello, world!",
 ),
 )
 for msg, exp in TESTS:
 # 3
 scores = []
 for i in range(26):
 txt = ''.join(shift(msg, i))
 g = (getGoodness(txt), txt)
 scores.append(g)
 got = min(scores)[1]
 # 2 Always be testing.
 print(exp == got, '=>', got[0:30])
# 4
if __name__ == '__main__':
 main(sys.argv[1:])

Question 3

thank you so much for your help! I have a question though - is a Counter necessarily faster or better in some way than the .count() function? thanks!

Question 4

@tommy2111111111 No, not faster (often slower). But it can simplify code in a variety of situations. It's not clear how much simplicity it provides in this case -- not a lot -- but it is a useful data structure to know about.

Question 5

You're diffing msg_frqs with FREQUENCIES. The latter's sum is 1, the former's isn't, because you're counting the upper case letters in msg_length but not in msg_frqs. That seems inappropriate. So better start getGoodness with msg = msg.lower().

And I wouldn't round. Why throw away information? Especially using extra code and time.

Question 6

thank you! this was helpful. totally forgot about the msg = msg.lower()

Question 7

Your part

goodness = []
shifted = []
for i in range(26):
 goodness.append(getGoodness(shift(msg, i)))
 shifted.append(shift(msg, i))
gdict = dict(zip(goodness, shifted))
print(gdict.get(min(gdict.keys())))

is rather complicated. You could just use min-by-goodness:

shifts = (shift(msg, i) for i in range(26))
print(min(shifts, key=getGoodness))

Also takes less memory, as it just keeps the best shift so far, doesn't build a list+dict containing all of them like yours.

Question 8

oh got it, yeah just figured it out when I tested out the code lol. how would we get the corresponding number of shifts?

Question 9

Not sure what you mean with that...

Question 10

like the second line print(min(shifts, key=getGoodness)) outputs the most likely plaintext, right? how would I get the number of places the ciphertext was shifted to the right to produce that most likely output?

Question 11

Ah, so something your original doesn't do? You could compute it separately with min(range(26), key=lambda i: getGoodness(shift(msg, i))) (didn't test this) or you could take the input message and the output message and find the first difference.

FMc FMcFMc 13.1k2 gold badges29 silver badges40 bronze badges · Accepted Answer · 2021-03-10 07:57:01Z

The shift() function was fairly easy to understand, in no small part because you've chosen reasonable and clear variable names. Here are a few relatively minor suggestions regarding the function. (1) Strings are iterable, so you can convert to characters directly. (2) Python makes iteration so easy that looping over data via an index is usually not needed. Instead, just iterate directly. And for those cases when an index is needed, enumerate() is often the best option, because it provides both index and value -- which helps your code by reducing its visual weight and thus aiding readability. (3) The assignment to new_char is a moderately complex expression. One way to improve code readability is to such computations apart. (4) A dict to lookup a letter's index would eliminate the need to scan ALPHABET repeatedly.

# 4
ALPHA_IDXS = {a:i for i, a in enumerate(ALPHABET)}
def shift(string, n):
 # 1
 shifted = list(string)
 # 2
 for i, char in enumerate(shifted):
 if char.isalpha():
 # 3 and 4
 j = (ALPHA_IDXS[char.lower()] + n) % len(ALPHABET)
 new_char = ALPHABET[j]
 if char.isupper():
 new_char = new_char.upper()
 shifted[i] = new_char # The only place we use index.
 return ''.join(shifted)

Another technique to consider is converting shift() into a generator function. This simplifies the code even more because the function no longer has to assemble a return data collection -- although at a small cost to the caller, which must do the assembling. Here's what that might look like.

def shift(string, n):
 for i, char in enumerate(string):
 if char.isalpha():
 j = (ALPHA_IDXS[char.lower()] + n) % len(ALPHABET)
 new = ALPHABET[j]
 char = new if char.islower() else new.upper()
 yield char

You did a nice job in getGoodness() with keeping the steps of the algorithm very clear. Here are a few suggestions: (1) Because bool is a subclass of int, you can treat True and False as integers -- just add up the Trues. Also, functions like sum() take any iterable, so you don't need an intermediate list. (2) msg_length seems a bit misleading: it is not the length of msg; it is the number of alpha characters. (3) A Counter can compute letter frequencies for you. (4) Shouldn't this calculation be done on a lowercase version of the text?

from collections import Counter
def getGoodness(msg):
 # 1 and 2
 n_alphas = sum(i.isalpha() for i in msg)
 # 3 and 4
 freqs = Counter(msg.lower())
 msg_frqs = [round(freqs[a] / n_alphas, 4) for a in ALPHABET]
 diffs = [abs(a - b) for a, b in zip(msg_frqs, FREQUENCIES)]
 avg_diff = round(sum(diffs) / len(diffs), 4)
 return avg_diff

Finally, some comments about the orchestration code. (1) Put your code in functions. (2) Set yourself up for testing. It helps a lot during the development and debugging lifecycle. (3) In the loop to compute goodnesses for the shifted strings, a tuple seems like a handier data structure than a dict. It simplifies the code and is better suited for min(). (4) I recommend the habit of setting up all scripts so they can take arguments -- again, useful during development and sometimes usage.

# 1
def main(args):
 # 2
 TESTS = (
 (
 "Oqqcfrwbu hc ozz ybckb zokg ct ojwohwcb, hvsfs wg bc kom o pss gvcizr ps opzs hc tzm. Whg kwbug ofs hcc gaozz hc ush whg toh zwhhzs pcrm ctt hvs ufcibr. Hvs pss, ct qcifgs, tzwsg obmkom psqoigs pssg rcb'h qofs kvoh viaobg hvwby wg wadcggwpzs. Mszzck, pzoqy. Mszzck, pzoqy. Mszzck, pzoqy. Mszzck, pzoqy. Ccv, pzoqy obr mszzck! Zsh'g gvoys wh id o zwhhzs. Poffm! Pfsoytogh wg fsorm!",
 "According to all known laws of aviation, there is no way a bee should be able to fly. Its wings are too small to get its fat little body off the ground. The bee, of course, flies anyway because bees don't care what humans think is impossible. Yellow, black. Yellow, black. Yellow, black. Yellow, black. Ooh, black and yellow! Let's shake it up a little. Barry! Breakfast is ready!",
 ),
 (
 "Ebiil, tloia!",
 "Hello, world!",
 ),
 )
 for msg, exp in TESTS:
 # 3
 scores = []
 for i in range(26):
 txt = ''.join(shift(msg, i))
 g = (getGoodness(txt), txt)
 scores.append(g)
 got = min(scores)[1]
 # 2 Always be testing.
 print(exp == got, '=>', got[0:30])
# 4
if __name__ == '__main__':
 main(sys.argv[1:])

thank you so much for your help! I have a question though - is a Counter necessarily faster or better in some way than the .count() function? thanks!
@tommy2111111111 No, not faster (often slower). But it can simplify code in a variety of situations. It's not clear how much simplicity it provides in this case -- not a lot -- but it is a useful data structure to know about.

Stack Exchange Network

Caesar Cipher Encryption / Decryption in Python

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Caesar Cipher Encryption / Decryption in Python

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions