Skip to main content
Code Review

Return to Question

Notice removed Draw attention by RubberDuck
Bounty Ended with Winston Ewert's answer chosen by RubberDuck
Notice added Draw attention by RubberDuck
Bounty Started worth 100 reputation by RubberDuck
Tweeted twitter.com/StackCodeReview/status/1044738448261828609
inline mathjax
Source Link
Vogel612
  • 25.5k
  • 7
  • 59
  • 141

If χ2\$\chi^2\$ < the critical value, then the decrypted cipher is determined to be English text, therefore we've cracked the cipher and found the key.

If χ2 < the critical value, then the decrypted cipher is determined to be English text, therefore we've cracked the cipher and found the key.

If \$\chi^2\$ < the critical value, then the decrypted cipher is determined to be English text, therefore we've cracked the cipher and found the key.

Source Link
RubberDuck
  • 31.1k
  • 6
  • 73
  • 176

Frequency Analysis & Chi-Squared Test

Following up on my implementation of Cryptopals Challenge 1, this is my solution to Challenge 3.

Single-byte XOR cipher The hex encoded string:

1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736

... has been XOR'd against a single character. Find the key, decrypt the message.

You can do this by hand. But don't: write code to do it for you.

How? Devise some method for "scoring" a piece of English plaintext. Character frequency is a good metric. Evaluate each output and choose the one with the best score.

The idea here is that I try to decrypt with every possible single byte repeating key (each extended ascii/utf8 character), then compare the resulting character frequency to the expected frequency with a Chi-Squared test.

$$ \chi^2 = \sum_{i=1}^n \frac{(obs - exp)^2}{exp} $$

If χ2 < the critical value, then the decrypted cipher is determined to be English text, therefore we've cracked the cipher and found the key.

I'm some data and a build script to do some code generation before compiling the rest of the project.

data/english.csv

32,17.1660
101,8.5771
116,6.3700
111,5.7701
97,5.1880
...

build.rs

use std::env;
use std::fs::File;
use std::io::BufRead;
use std::io::BufReader;
use std::io::Write;
use std::path::Path;
fn main() {
 let declaration = String::from("fn english_frequencies() -> HashMap<u8, f32> {[");
 // csv must be 2 columns, no header
 // ascii number, frequency as percentage
 // 32,17.16660
 let file = File::open("data/english.csv").unwrap();
 let reader = BufReader::new(&file);
 let formatted_lines = reader
 .lines()
 .map(|line| format!("({}),\n", line.unwrap()))
 .collect();
 let close = String::from("].iter().cloned().collect()}");
 let out_dir = env::var("OUT_DIR").unwrap();
 let dest_path = Path::new(&out_dir).join("english_frequencies.rs");
 let mut f = File::create(&dest_path).unwrap();
 f.write_all(
 &[declaration, formatted_lines, close]
 .join("\n")
 .into_bytes(),
 ).unwrap();
}

This generated table of expected frequencies is then included in a module that implements the frequency analysis.

src/frequency.rs

use std::collections::HashMap;
include!(concat!(env!("OUT_DIR"), "/english_frequencies.rs"));
pub fn english(message: &str) -> bool {
 let expected_counts: HashMap<char, f32> = english_frequencies()
 .iter()
 .map(|(k, freq)| (k.clone() as char, (freq / 100.0) * (message.len() as f32)))
 .collect();
 let actual_counts = message
 .chars()
 .fold(HashMap::new(), |mut acc: HashMap<char, isize>, c| {
 let count = match acc.get(&c) {
 Some(x) => x.clone() + 1,
 None => 1,
 };
 acc.insert(c, count);
 acc
 });
 let chi_statistic = chi_statistic(actual_counts, expected_counts);
 if cfg!(debug_assertions) {
 println!("X-statistic: {}", chi_statistic);
 }
 // Degrees of freedom = 256 - 1 = 255 (character space)
 // Usign this table:
 // https://en.wikibooks.org/wiki/Engineering_Tables/Chi-Squared_Distibution
 // We can use the approximate value for 250 degrees of fredom.
 // Given a significance factor (alpha) of 0.05, our critical value is 287.882.
 // If our chi_statistic is < the critical_value, then we have a match.
 // See this page for an explanation:
 // https://en.wikipedia.org/wiki/Chi-squared_distribution#Table_of_%CF%872_values_vs_p-values
 chi_statistic < 287.882
}
/// Calculates Pearson's Cumulative Chi Statistic
/// https://en.wikipedia.org/wiki/Pearson%27s_chi-squared_test#Calculating_the_test-statistic
///
/// This is a slight variation.
/// Technichally, if the expected value is zero and the actual is non-zero, then the statistic is infinite.
/// For the sake of ergonommics, this implementation assumes missing expected values to be small, but non-zero.
/// This allows us to only specify values in the expected frequencies that are statistically
/// significant while allowing for all valid utf-8 characters in the message.
fn chi_statistic(observed: HashMap<char, isize>, expected: HashMap<char, f32>) -> f32 {
 observed
 .into_iter()
 .map(|(key, obs)| {
 let exp = match expected.get(&key) {
 Some(x) => x.clone() as f32,
 None => 0.0000001, //non-zero, but tiny possibility
 };
 (obs as f32 - exp).powi(2) / exp
 }).sum()
}
#[cfg(test)]
mod tests {
 use super::*;
 #[test]
 fn bacon_message_is_english() {
 let message = "Cooking MC's like a pound of bacon";
 assert!(english(message));
 }
 #[test]
 fn message_with_unprintable_chars_is_not_english() {
 assert!(!english(
 "\u{7f}SSWUR[\u{1c}q\u{7f}\u{1b}O\u{1c}PUWY\u{1c}]\u{1c}LSIRX\u{1c}SZ\u{1c}^]_SR"
 ));
 }
 #[test]
 fn printable_nonsense_is_not_english() {
 assert!(!english("Yuuqst}:WY=i:vsq\u{7f}:{:juot~:u|:x{yut"));
 }
 #[test]
 fn readable_but_incorrect_is_not_english() {
 assert!(!english(
 "cOOKING\u{0}mc\u{7}S\u{0}LIKE\u{0}A\u{0}POUND\u{0}OF\u{0}BACON"
 ));
 }
}

Finally, we call it from the main program and crack the cipher.

src/main.rs

extern crate cryptopals;
use cryptopals::byte_array::xor;
use cryptopals::frequency;
use cryptopals::hex;
use std::iter;
fn main() {
 let secret =
 hex::to_bytes("1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736");
 println!("{:?}", crack_xor(&secret));
}
fn crack_xor(cipher: &[u8]) -> Vec<String> {
 let alphabet = 0..255u8; /* ascii/utf-8 range */
 alphabet
 .into_iter()
 .filter_map(|c| {
 let key = iter::repeat(c).take(cipher.len()).collect::<Vec<u8>>();
 let decrypted = xor(&cipher, &key);
 match String::from_utf8(decrypted) {
 Ok(s) => Some(s),
 Err(_) => None,
 }
 }).filter(|s| frequency::english(s))
 .collect::<Vec<String>>()
}
#[cfg(test)]
mod tests {
 use super::*;
 #[test]
 fn analysis_matches_bacon_message() {
 let secret =
 hex::to_bytes("1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736");
 let actual = crack_xor(&secret);
 assert_eq!(vec!["Cooking MC's like a pound of bacon"], actual);
 }
}
lang-rust

AltStyle によって変換されたページ (->オリジナル) /