The code along with the utils module is also at github. If you'd also have any advice on how the whole crate is structured I'd love to hear it. :)
use std::ascii::AsciiExt;
use std::str;
use utils::hex_to_bytes;
use s1c2::fixed_xor;
/// Scores ASCII test represented by byte array. The higher the score, the more common
/// English characters the text contains. Letter frequencies are taken from
/// https://en.wikipedia.org/wiki/Letter_frequency.
fn score_text(text: &[u8]) -> f32 {
let frequencies = "xzqkjupnlgeyihrmfsdcbwaot";
let text = str::from_utf8(text).unwrap();
let score: usize = text.chars().map(|letter| {
frequencies.find(letter.to_ascii_lowercase()).unwrap_or(0)
}).sum();
score as f32/text.len() as f32
}
/// Tries to decrypt text encrypted with a single character XOR
/// encryption.
pub fn decrypt_xor(ciphertext: &str) -> Option<(char, String)> {
let cipherbytes = hex_to_bytes(ciphertext);
let mut max = 0.0;
let mut best_solution = None;
// 32 to 127 should cover printable ASCII characters
for character in 32..128 {
let cipher = vec![character; cipherbytes.len()];
let plaintext = fixed_xor(&cipherbytes, &cipher);
let score = score_text(&plaintext);
if score > max {
max = score;
best_solution = Some((character as char, String::from_utf8(plaintext).unwrap()));
}
}
best_solution
}
#[test]
fn test_score_text() {
assert_eq!(score_text(b"x"), 0.0);
assert_eq!(score_text(b"Z"), 1.0);
assert_eq!(score_text(b"$"), 0.0);
assert_eq!(score_text(b"zZz"), 1.0);
}
#[test]
fn test_decrypt_xor() {
assert_eq!(decrypt_xor("1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736"),
Some(('X', "Cooking MC's like a pound of bacon".to_string())));
}
1 Answer 1
Perform the UTF-8 conversion outside of
score_text
. With a name like_text
, it should probably accept a&str
anyway. Note that types are a kind of static assertion about the data;&str
is a way of saying "UTF-8 encoded bytes". This also helps highlight that the UTF-8 check was being done twice.Use
expect
instead ofunwrap
. When the code fails, you will be thankful.The tracking of the maximum is annoying. Ideally, you'd be able to just say
max_by_key
on the iterator, butf32
s don't implementOrd
. There's the possibility of creating a wrapper type that ensures it is neverNaN
, but that feels like overkill here. You could usemax_by
, but that's not currently stable, and would still involve anunwrap
. Instead, I'd note that all of the strings are the same length here, so dividing them by the length is unnecessary. Adjust the score function to return the previous numerator as ausize
and then you can usemax_by_key
.Making this change shows that you didn't just want "max", you were relying on finding the first one. Evidently, the
max_by_key
algorithm returns the last-most equal value. This causes your tests to fail and pickx
instead ofX
as the key. You would have seen this problem previously if the key had beenx
. You should probably include spaces in the scoring function. "In English, the space is slightly more frequent than the top letter (e)".Speaking of the scoring function, it appears to only have 25 characters? And the order doesn't make any sense to me. The highly-recognizable "etaoin" is missing. You may wish to double-check your work there.
I'd adjust your tests to assert on the properties of the scoring function that you care about. You don't care about a specific value, you care that:
- a more-common letter has a higher score than a less-common one
- ASCII case differences do not change the score
- unknown letters don't crash
This makes me realize that unknown letters will have the same score as 'z', which seems incorrect.
use std::ascii::AsciiExt;
use std::str;
use utils::hex_to_bytes;
use s1c2::fixed_xor;
/// Scores ASCII test represented by byte array. The higher the score, the more common
/// English characters the text contains. Letter frequencies are taken from
/// https://en.wikipedia.org/wiki/Letter_frequency.
fn score_text(text: &str) -> usize {
let frequencies = "zqxjkvbpygfwmucldrhsnioate ";
text.chars().map(|letter| {
frequencies.find(letter.to_ascii_lowercase()).map_or(0, |score| score + 1)
}).sum()
}
/// Tries to decrypt text encrypted with a single character XOR
/// encryption.
pub fn decrypt_xor(ciphertext: &str) -> Option<(char, String)> {
let cipherbytes = hex_to_bytes(ciphertext);
// 32 to 127 should cover printable ASCII characters
(32..128).map(|character| {
let cipher = vec![character; cipherbytes.len()];
let plaintext = fixed_xor(&cipherbytes, &cipher);
(character as char, String::from_utf8(plaintext).expect("Wasn't UTF-8"))
}).max_by_key(|a| score_text(&a.1))
}
#[test]
fn test_score_text() {
assert!(score_text("e") > score_text("x"));
assert_eq!(score_text("e"), score_text("E"));
assert!(score_text("$") < score_text("a"));
}
#[test]
fn test_decrypt_xor() {
assert_eq!(decrypt_xor("1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736"),
Some(('X', "Cooking MC's like a pound of bacon".to_string())));
}
The scoring idea is interesting, but I'm not in love with it. Something with two X shouldn't count the same as one Z. You could probably make better use of the relative frequencies.