Skip to main content
Code Review

Return to Revisions

1 of 2
RubberDuck
  • 31.2k
  • 6
  • 73
  • 176

Frequency Analysis & Chi-Squared Test

Following up on my implementation of Cryptopals Challenge 1, this is my solution to Challenge 3.

Single-byte XOR cipher The hex encoded string:

1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736

... has been XOR'd against a single character. Find the key, decrypt the message.

You can do this by hand. But don't: write code to do it for you.

How? Devise some method for "scoring" a piece of English plaintext. Character frequency is a good metric. Evaluate each output and choose the one with the best score.

The idea here is that I try to decrypt with every possible single byte repeating key (each extended ascii/utf8 character), then compare the resulting character frequency to the expected frequency with a Chi-Squared test.

$$ \chi^2 = \sum_{i=1}^n \frac{(obs - exp)^2}{exp} $$

If χ2 < the critical value, then the decrypted cipher is determined to be English text, therefore we've cracked the cipher and found the key.

I'm some data and a build script to do some code generation before compiling the rest of the project.

data/english.csv

32,17.1660
101,8.5771
116,6.3700
111,5.7701
97,5.1880
...

build.rs

use std::env;
use std::fs::File;
use std::io::BufRead;
use std::io::BufReader;
use std::io::Write;
use std::path::Path;
fn main() {
 let declaration = String::from("fn english_frequencies() -> HashMap<u8, f32> {[");
 // csv must be 2 columns, no header
 // ascii number, frequency as percentage
 // 32,17.16660
 let file = File::open("data/english.csv").unwrap();
 let reader = BufReader::new(&file);
 let formatted_lines = reader
 .lines()
 .map(|line| format!("({}),\n", line.unwrap()))
 .collect();
 let close = String::from("].iter().cloned().collect()}");
 let out_dir = env::var("OUT_DIR").unwrap();
 let dest_path = Path::new(&out_dir).join("english_frequencies.rs");
 let mut f = File::create(&dest_path).unwrap();
 f.write_all(
 &[declaration, formatted_lines, close]
 .join("\n")
 .into_bytes(),
 ).unwrap();
}

This generated table of expected frequencies is then included in a module that implements the frequency analysis.

src/frequency.rs

use std::collections::HashMap;
include!(concat!(env!("OUT_DIR"), "/english_frequencies.rs"));
pub fn english(message: &str) -> bool {
 let expected_counts: HashMap<char, f32> = english_frequencies()
 .iter()
 .map(|(k, freq)| (k.clone() as char, (freq / 100.0) * (message.len() as f32)))
 .collect();
 let actual_counts = message
 .chars()
 .fold(HashMap::new(), |mut acc: HashMap<char, isize>, c| {
 let count = match acc.get(&c) {
 Some(x) => x.clone() + 1,
 None => 1,
 };
 acc.insert(c, count);
 acc
 });
 let chi_statistic = chi_statistic(actual_counts, expected_counts);
 if cfg!(debug_assertions) {
 println!("X-statistic: {}", chi_statistic);
 }
 // Degrees of freedom = 256 - 1 = 255 (character space)
 // Usign this table:
 // https://en.wikibooks.org/wiki/Engineering_Tables/Chi-Squared_Distibution
 // We can use the approximate value for 250 degrees of fredom.
 // Given a significance factor (alpha) of 0.05, our critical value is 287.882.
 // If our chi_statistic is < the critical_value, then we have a match.
 // See this page for an explanation:
 // https://en.wikipedia.org/wiki/Chi-squared_distribution#Table_of_%CF%872_values_vs_p-values
 chi_statistic < 287.882
}
/// Calculates Pearson's Cumulative Chi Statistic
/// https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test#Calculating_the_test-statistic
///
/// This is a slight variation.
/// Technichally, if the expected value is zero and the actual is non-zero, then the statistic is infinite.
/// For the sake of ergonommics, this implementation assumes missing expected values to be small, but non-zero.
/// This allows us to only specify values in the expected frequencies that are statistically
/// significant while allowing for all valid utf-8 characters in the message.
fn chi_statistic(observed: HashMap<char, isize>, expected: HashMap<char, f32>) -> f32 {
 observed
 .into_iter()
 .map(|(key, obs)| {
 let exp = match expected.get(&key) {
 Some(x) => x.clone() as f32,
 None => 0.0000001, //non-zero, but tiny possibility
 };
 (obs as f32 - exp).powi(2) / exp
 }).sum()
}
#[cfg(test)]
mod tests {
 use super::*;
 #[test]
 fn bacon_message_is_english() {
 let message = "Cooking MC's like a pound of bacon";
 assert!(english(message));
 }
 #[test]
 fn message_with_unprintable_chars_is_not_english() {
 assert!(!english(
 "\u{7f}SSWUR[\u{1c}q\u{7f}\u{1b}O\u{1c}PUWY\u{1c}]\u{1c}LSIRX\u{1c}SZ\u{1c}^]_SR"
 ));
 }
 #[test]
 fn printable_nonsense_is_not_english() {
 assert!(!english("Yuuqst}:WY=i:vsq\u{7f}:{:juot~:u|:x{yut"));
 }
 #[test]
 fn readable_but_incorrect_is_not_english() {
 assert!(!english(
 "cOOKING\u{0}mc\u{7}S\u{0}LIKE\u{0}A\u{0}POUND\u{0}OF\u{0}BACON"
 ));
 }
}

Finally, we call it from the main program and crack the cipher.

src/main.rs

extern crate cryptopals;
use cryptopals::byte_array::xor;
use cryptopals::frequency;
use cryptopals::hex;
use std::iter;
fn main() {
 let secret =
 hex::to_bytes("1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736");
 println!("{:?}", crack_xor(&secret));
}
fn crack_xor(cipher: &[u8]) -> Vec<String> {
 let alphabet = 0..255u8; /* ascii/utf-8 range */
 alphabet
 .into_iter()
 .filter_map(|c| {
 let key = iter::repeat(c).take(cipher.len()).collect::<Vec<u8>>();
 let decrypted = xor(&cipher, &key);
 match String::from_utf8(decrypted) {
 Ok(s) => Some(s),
 Err(_) => None,
 }
 }).filter(|s| frequency::english(s))
 .collect::<Vec<String>>()
}
#[cfg(test)]
mod tests {
 use super::*;
 #[test]
 fn analysis_matches_bacon_message() {
 let secret =
 hex::to_bytes("1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736");
 let actual = crack_xor(&secret);
 assert_eq!(vec!["Cooking MC's like a pound of bacon"], actual);
 }
}
RubberDuck
  • 31.2k
  • 6
  • 73
  • 176
lang-rust

AltStyle によって変換されたページ (->オリジナル) /