Decrypting single byte XOR encryption

Question 1

The code along with the utils module is also at github. If you'd also have any advice on how the whole crate is structured I'd love to hear it. :)

use std::ascii::AsciiExt;
use std::str;
use utils::hex_to_bytes;
use s1c2::fixed_xor;
/// Scores ASCII test represented by byte array. The higher the score, the more common
/// English characters the text contains. Letter frequencies are taken from
/// https://en.wikipedia.org/wiki/Letter_frequency.
fn score_text(text: &[u8]) -> f32 {
 let frequencies = "xzqkjupnlgeyihrmfsdcbwaot";
 let text = str::from_utf8(text).unwrap();
 let score: usize = text.chars().map(|letter| {
 frequencies.find(letter.to_ascii_lowercase()).unwrap_or(0)
 }).sum();
 score as f32/text.len() as f32
}
/// Tries to decrypt text encrypted with a single character XOR
/// encryption.
pub fn decrypt_xor(ciphertext: &str) -> Option<(char, String)> {
 let cipherbytes = hex_to_bytes(ciphertext);
 let mut max = 0.0;
 let mut best_solution = None;
 // 32 to 127 should cover printable ASCII characters
 for character in 32..128 {
 let cipher = vec![character; cipherbytes.len()];
 let plaintext = fixed_xor(&cipherbytes, &cipher);
 let score = score_text(&plaintext);
 if score > max {
 max = score;
 best_solution = Some((character as char, String::from_utf8(plaintext).unwrap()));
 }
 }
 best_solution
}
#[test]
fn test_score_text() {
 assert_eq!(score_text(b"x"), 0.0);
 assert_eq!(score_text(b"Z"), 1.0);
 assert_eq!(score_text(b"$"), 0.0);
 assert_eq!(score_text(b"zZz"), 1.0);
}
#[test]
fn test_decrypt_xor() {
 assert_eq!(decrypt_xor("1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736"),
 Some(('X', "Cooking MC's like a pound of bacon".to_string())));
}

Question 2

Perform the UTF-8 conversion outside of score_text. With a name like _text, it should probably accept a &str anyway. Note that types are a kind of static assertion about the data; &str is a way of saying "UTF-8 encoded bytes". This also helps highlight that the UTF-8 check was being done twice.
Use expect instead of unwrap. When the code fails, you will be thankful.
The tracking of the maximum is annoying. Ideally, you'd be able to just say max_by_key on the iterator, but f32s don't implement Ord. There's the possibility of creating a wrapper type that ensures it is never NaN, but that feels like overkill here. You could use max_by, but that's not currently stable, and would still involve an unwrap. Instead, I'd note that all of the strings are the same length here, so dividing them by the length is unnecessary. Adjust the score function to return the previous numerator as a usize and then you can use max_by_key.
Making this change shows that you didn't just want "max", you were relying on finding the first one. Evidently, the max_by_key algorithm returns the last-most equal value. This causes your tests to fail and pick x instead of X as the key. You would have seen this problem previously if the key had been x. You should probably include spaces in the scoring function. "In English, the space is slightly more frequent than the top letter (e)".
Speaking of the scoring function, it appears to only have 25 characters? And the order doesn't make any sense to me. The highly-recognizable "etaoin" is missing. You may wish to double-check your work there.
I'd adjust your tests to assert on the properties of the scoring function that you care about. You don't care about a specific value, you care that:
- a more-common letter has a higher score than a less-common one
- ASCII case differences do not change the score
- unknown letters don't crash
This makes me realize that unknown letters will have the same score as 'z', which seems incorrect.

use std::ascii::AsciiExt;
use std::str;
use utils::hex_to_bytes;
use s1c2::fixed_xor;
/// Scores ASCII test represented by byte array. The higher the score, the more common
/// English characters the text contains. Letter frequencies are taken from
/// https://en.wikipedia.org/wiki/Letter_frequency.
fn score_text(text: &str) -> usize {
 let frequencies = "zqxjkvbpygfwmucldrhsnioate ";
 text.chars().map(|letter| {
 frequencies.find(letter.to_ascii_lowercase()).map_or(0, |score| score + 1)
 }).sum()
}
/// Tries to decrypt text encrypted with a single character XOR
/// encryption.
pub fn decrypt_xor(ciphertext: &str) -> Option<(char, String)> {
 let cipherbytes = hex_to_bytes(ciphertext);
 // 32 to 127 should cover printable ASCII characters
 (32..128).map(|character| {
 let cipher = vec![character; cipherbytes.len()];
 let plaintext = fixed_xor(&cipherbytes, &cipher);
 (character as char, String::from_utf8(plaintext).expect("Wasn't UTF-8"))
 }).max_by_key(|a| score_text(&a.1))
}
#[test]
fn test_score_text() {
 assert!(score_text("e") > score_text("x"));
 assert_eq!(score_text("e"), score_text("E"));
 assert!(score_text("$") < score_text("a"));
}
#[test]
fn test_decrypt_xor() {
 assert_eq!(decrypt_xor("1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736"),
 Some(('X', "Cooking MC's like a pound of bacon".to_string())));
}

The scoring idea is interesting, but I'm not in love with it. Something with two X shouldn't count the same as one Z. You could probably make better use of the relative frequencies.

Shepmaster Shepmaster 8,77827 silver badges28 bronze badges · Accepted Answer · 2016-11-25 23:29:50Z

Perform the UTF-8 conversion outside of score_text. With a name like _text, it should probably accept a &str anyway. Note that types are a kind of static assertion about the data; &str is a way of saying "UTF-8 encoded bytes". This also helps highlight that the UTF-8 check was being done twice.
Use expect instead of unwrap. When the code fails, you will be thankful.
The tracking of the maximum is annoying. Ideally, you'd be able to just say max_by_key on the iterator, but f32s don't implement Ord. There's the possibility of creating a wrapper type that ensures it is never NaN, but that feels like overkill here. You could use max_by, but that's not currently stable, and would still involve an unwrap. Instead, I'd note that all of the strings are the same length here, so dividing them by the length is unnecessary. Adjust the score function to return the previous numerator as a usize and then you can use max_by_key.
Making this change shows that you didn't just want "max", you were relying on finding the first one. Evidently, the max_by_key algorithm returns the last-most equal value. This causes your tests to fail and pick x instead of X as the key. You would have seen this problem previously if the key had been x. You should probably include spaces in the scoring function. "In English, the space is slightly more frequent than the top letter (e)".
Speaking of the scoring function, it appears to only have 25 characters? And the order doesn't make any sense to me. The highly-recognizable "etaoin" is missing. You may wish to double-check your work there.
I'd adjust your tests to assert on the properties of the scoring function that you care about. You don't care about a specific value, you care that:
- a more-common letter has a higher score than a less-common one
- ASCII case differences do not change the score
- unknown letters don't crash
This makes me realize that unknown letters will have the same score as 'z', which seems incorrect.

use std::ascii::AsciiExt;
use std::str;
use utils::hex_to_bytes;
use s1c2::fixed_xor;
/// Scores ASCII test represented by byte array. The higher the score, the more common
/// English characters the text contains. Letter frequencies are taken from
/// https://en.wikipedia.org/wiki/Letter_frequency.
fn score_text(text: &str) -> usize {
 let frequencies = "zqxjkvbpygfwmucldrhsnioate ";
 text.chars().map(|letter| {
 frequencies.find(letter.to_ascii_lowercase()).map_or(0, |score| score + 1)
 }).sum()
}
/// Tries to decrypt text encrypted with a single character XOR
/// encryption.
pub fn decrypt_xor(ciphertext: &str) -> Option<(char, String)> {
 let cipherbytes = hex_to_bytes(ciphertext);
 // 32 to 127 should cover printable ASCII characters
 (32..128).map(|character| {
 let cipher = vec![character; cipherbytes.len()];
 let plaintext = fixed_xor(&cipherbytes, &cipher);
 (character as char, String::from_utf8(plaintext).expect("Wasn't UTF-8"))
 }).max_by_key(|a| score_text(&a.1))
}
#[test]
fn test_score_text() {
 assert!(score_text("e") > score_text("x"));
 assert_eq!(score_text("e"), score_text("E"));
 assert!(score_text("$") < score_text("a"));
}
#[test]
fn test_decrypt_xor() {
 assert_eq!(decrypt_xor("1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736"),
 Some(('X', "Cooking MC's like a pound of bacon".to_string())));
}

The scoring idea is interesting, but I'm not in love with it. Something with two X shouldn't count the same as one Z. You could probably make better use of the relative frequencies.

Stack Exchange Network

Decrypting single byte XOR encryption

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Decrypting single byte XOR encryption

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions