Skip to main content
Code Review

Return to Question

deleted 77 characters in body
Source Link
  • Is this code actually notably faster than the sorting method? In my tests it seemed to perform O(n), as Wikipedia suggested
  • Does the use of a closure and passing noop to later iterations slow down the code notably?
  • Is the structure and organization of the code sensible?
  • Is the Clone requirement for median_and_mode sensible? I ran into issues with the borrow checker, which I still don’t fully understand, and used Clone, but I’m not sure if it’s a sensible requirement
  • Is the Clone requirement for select_and_iterate sensible? It comes from the call to clone in pivot, where I assume it’s because the compiler must assume that the mutable borrow in swap might result in a dangling pointer. Would this be a good place to use unsafe? Or is there some hack I’m not aware of that might circumvent this need? I assume it’s not that big of a deal as any other use of the same code could just pass references to be compared, as is done here, too, to avoid median_and_mode taking a mutable reference
  • Is this code actually notably faster than the sorting method? In my tests it seemed to perform O(n), as Wikipedia suggested
  • Does the use of a closure and passing noop to later iterations slow down the code notably?
  • Is the structure and organization of the code sensible?
  • Is the Clone requirement for median_and_mode sensible? I ran into issues with the borrow checker, which I still don’t fully understand, and used Clone, but I’m not sure if it’s a sensible requirement
  • Is the Clone requirement for select_and_iterate sensible? It comes from the call to clone in pivot, where I assume it’s because the compiler must assume that the mutable borrow in swap might result in a dangling pointer. Would this be a good place to use unsafe? Or is there some hack I’m not aware of that might circumvent this need? I assume it’s not that big of a deal as any other use of the same code could just pass references to be compared, as is done here, too, to avoid median_and_mode taking a mutable reference
  • Is this code actually notably faster than the sorting method? In my tests it seemed to perform O(n), as Wikipedia suggested
  • Does the use of a closure and passing noop to later iterations slow down the code notably?
  • Is the structure and organization of the code sensible?
  • Is the Clone requirement for median_and_mode sensible? I ran into issues with the borrow checker, which I still don’t fully understand, and used Clone, but I’m not sure if it’s a sensible requirement
  • Is the Clone requirement for select_and_iterate sensible? It comes from the call to clone in pivot, where I assume it’s because the compiler must assume that the mutable borrow in swap might result in a dangling pointer. Would this be a good place to use unsafe? Or is there some hack I’m not aware of that might circumvent this need? I assume it’s not that big of a deal as any other use of the same code could just pass references to be compared
Source Link

Finding the Median and Mode of a slice in Rust

Overview

I’m trying to learn Rust, so I’m reading The Rust Programming Language. At Chapter 8, there was a task to write a program that calculates the median and mode of a bunch of numbers. I originally skipped the task and kept reading but I’ve returned to it as I think it will give me some practice with the language. I’ve designed my solution as a library crate that provides a function pub fn median_and_mode<T: Ord + Eq + Hash + Clone>(values: &mut [T]) -> Option<MedianAndMode<T>>.

The directory structure is as follows:

src
├── common.rs
├── lib.rs
├── select_and_iterate
│  └── test.rs
├── select_and_iterate.rs
└── test.rs

I took a hopefully more efficient approach to the task than sorting the list, by using the median of medians algorithm and quickselect, which I largely copied from Wikipedia, but slightly modified to also count each element. To do this I used a point in the algorithm where it already looped over every element. However, I’m unsure of if my use of a closure to do this might slow things down, as later iterations are passed a noop closure, which might slow things down a bit.

I used the proptest crate to create property-based tests for my code.

Here are the source files from my project:

lib.rs

mod common;
mod select_and_iterate;
#[cfg(test)]
mod test;
use std::{
 collections::{HashMap, HashSet},
 hash::Hash,
};
use crate::{common::noop, select_and_iterate::select_and_iterate};
#[derive(Debug, PartialEq, Eq)]
pub enum Median<T> {
 At(T),
 Between(T, T),
}
#[derive(Debug, PartialEq, Eq)]
pub struct Mode<T: Eq + Hash>(HashSet<T>);
#[derive(Debug, PartialEq, Eq)]
pub struct MedianAndMode<T: Eq + Hash> {
 pub median: Median<T>,
 pub mode: Mode<T>,
}
pub fn median_and_mode<T: Ord + Eq + Hash + Clone>(values: &mut [T]) -> Option<MedianAndMode<T>> {
 let len = values.len();
 if len == 0 {
 return None;
 }
 let mut frequencies = HashMap::new();
 let action = |x: &T| {
 let frequency = frequencies.entry((*x).clone()).or_insert(0);
 *frequency += 1;
 };
 let median;
 if len % 2 == 1 {
 let middle = len / 2;
 let Some(median_index) = select_and_iterate(values, middle, action)
 else { return None; };
 median = Median::At(values[median_index].clone());
 } else {
 let middle_1 = len / 2 - 1;
 let middle_2 = len / 2;
 let Some(median_1_index) = select_and_iterate(values, middle_1, action)
 else { return None; };
 let median_1 = values[median_1_index].clone();
 let Some(median_2_index) = select_and_iterate(values, middle_2, noop)
 else { panic!() };
 let median_2 = values[median_2_index].clone();
 if median_1 == median_2 {
 median = Median::At(median_1);
 } else {
 median = Median::Between(median_1, median_2);
 }
 }
 let mode = Mode(get_mode(frequencies));
 Some(MedianAndMode { median, mode })
}
fn get_mode<T: Eq + Hash>(frequencies: HashMap<T, usize>) -> HashSet<T> {
 let mut modes = HashSet::new();
 let mut highest_frequency = 0;
 for (value, frequency) in frequencies {
 match frequency.cmp(&highest_frequency) {
 std::cmp::Ordering::Less => {}
 std::cmp::Ordering::Equal => {
 modes.insert(value);
 }
 std::cmp::Ordering::Greater => {
 highest_frequency = frequency;
 modes.clear();
 modes.insert(value);
 }
 }
 }
 modes
}

select_and_iterate.rs

#[cfg(test)]
mod test;
use std::cmp::{min, Ordering};
use crate::common::noop;
// Algorithm stolen wholesale from Wikipedia: https://en.wikipedia.org/wiki/Median_of_medians
pub fn select_and_iterate<T: Ord + Clone>(
 values: &mut [T],
 index: usize,
 action: impl FnMut(&T),
) -> Option<usize> {
 let len = values.len();
 if len == 0 || index > len {
 return None;
 }
 Some(select_and_iterate_inner(values, index, action))
}
fn select_and_iterate_inner<T: Ord + Clone>(
 values: &mut [T],
 index: usize,
 mut action: impl FnMut(&T),
) -> usize {
 let len = values.len();
 debug_assert_ne!(len, 0);
 if len == 1 {
 debug_assert_eq!(index, 0);
 action(&values[0]);
 return 0;
 }
 let pivot_index = pivot(values);
 let pivot_index = partition(values, pivot_index, index, action);
 match index.cmp(&pivot_index) {
 Ordering::Less => select_and_iterate_inner(&mut values[0..pivot_index], index, noop),
 Ordering::Equal => pivot_index,
 Ordering::Greater => {
 select_and_iterate_inner(
 &mut values[pivot_index + 1..len],
 index - (pivot_index + 1),
 noop,
 ) + (pivot_index + 1)
 }
 }
}
fn pivot<T: Ord + Clone>(values: &mut [T]) -> usize {
 let len = values.len();
 if len <= 5 {
 return median_of_5(values);
 }
 for i in (0..len - 1).step_by(5) {
 let right_index = min(i + 4, len - 1);
 let median = median_of_5(&mut values[i..right_index]);
 values.swap(median, i / 5);
 }
 select_and_iterate_inner(&mut values[0..len / 5], (len - 1) / 10, noop)
}
fn median_of_5<T: Ord>(values: &mut [T]) -> usize {
 values.sort();
 (values.len() - 1) / 2
}
fn partition<T: Ord + Clone>(
 values: &mut [T],
 pivot_index: usize,
 target_index: usize,
 mut action: impl FnMut(&T),
) -> usize {
 let len = values.len();
 let pivot_value_ref = &values[pivot_index];
 action(pivot_value_ref);
 let pivot_value = pivot_value_ref.clone();
 values.swap(pivot_index, len - 1);
 let mut store_index = 0;
 for i in 0..len - 1 {
 action(&values[i]);
 if values[i] < pivot_value {
 values.swap(store_index, i);
 store_index += 1;
 }
 }
 let mut store_index_eq = store_index;
 for i in store_index..len - 1 {
 if values[i] == pivot_value {
 values.swap(store_index_eq, i);
 store_index_eq += 1;
 }
 }
 values.swap(len - 1, store_index_eq);
 if target_index < store_index {
 store_index
 } else if target_index <= store_index_eq {
 target_index
 } else {
 store_index_eq
 }
}

common.rs

pub fn noop<T>(_: &T) {}

test.rs

use super::{median_and_mode, Median, MedianAndMode, Mode};
use proptest::{collection::vec, prop_assert, prop_assert_eq, prop_assert_ne, proptest};
use std::collections::HashSet;
// See https://github.com/proptest-rs/proptest/issues/256
#[test]
fn test_median_and_mode_empty_array() {
 let mut values: [i128; 0] = [];
 let None = median_and_mode(&mut values)
 else { panic!("Wrong result pattern") };
}
#[test]
fn test_median_and_mode_1() {
 let mut values: [i128; 12] = [
 30050, 17767, 12534, -24364, 20538, -17, 690, -7966, -40, -1172, -25598, 34,
 ];
 let Some(MedianAndMode { median, mode }) = median_and_mode(&mut values)
 else { panic!() };
 assert_eq!(median, Median::Between(-17, 34));
 assert_eq!(mode, Mode(HashSet::from_iter(values.iter().copied())))
}
#[test]
fn test_median_and_mode_2() {
 let mut values: [i128; 9] = [
 7952,
 19412,
 -1450,
 6978825196251534519,
 11125,
 5270098434161345047,
 -13739,
 -27060,
 -467,
 ];
 let Some(MedianAndMode { median, mode }) = median_and_mode(&mut values)
 else { panic!("Wrong result pattern") };
 assert_eq!(median, Median::At(7952));
 assert_eq!(mode, Mode(HashSet::from_iter(values.iter().copied())));
}
proptest! {
 #[test]
 fn proptest_median_and_mode(mut values in vec(i8::MIN..i8::MAX, 1..32768)) {
 let len = values.len();
 if len == 0 {
 let None = median_and_mode(&mut values)
 else { panic!("Wrong result pattern") };
 } else {
 let Some(MedianAndMode { median, mode }) = median_and_mode(&mut values)
 else { panic!("Wrong result pattern") };
 values.sort();
 match median {
 Median::At(x) => {
 prop_assert_eq!(x, values[len / 2]);
 },
 Median::Between(x, y) => {
 prop_assert_eq!(x, values[len / 2 - 1]);
 prop_assert_eq!(y, values[len / 2]);
 },
 }
 let Mode(mode) = mode;
 prop_assert_ne!(mode.len(), 0);
 let mut frequencies = Vec::new();
 for value in mode {
 frequencies.push(values.iter().filter(|n| **n == value).count())
 };
 let first_frequency = frequencies[0];
 prop_assert!(first_frequency <= len);
 prop_assert!(frequencies.iter().all(|n| *n == first_frequency));
 }
 }
 #[test]
 fn proptest_median_and_mode_singleton_vec(value in i128::MIN..i128::MAX) {
 let mut values = vec![value];
 let Some(MedianAndMode { median, mode }) = median_and_mode(&mut values)
 else { panic!("Wrong result pattern") };
 prop_assert_eq!(median, Median::At(value));
 prop_assert_eq!(mode, Mode(HashSet::from([value])))
 }
}

select_and_iterate/test.rs

use super::select_and_iterate;
use crate::common::noop;
use proptest::{collection::vec, prop_assert, prop_assert_eq, proptest};
use std::collections::HashMap;
#[test]
fn test_select_empty_vec() {
 let mut values: Vec<i128> = vec![];
 let None = select_and_iterate(&mut values, 0, noop)
 else { panic!("Wrong result pattern") };
}
proptest! {
 #[test]
 fn proptest_select(mut values in vec(i8::MIN..i8::MAX, 1..32768), index in 0..usize::MAX) {
 let len = values.len();
 if len == 0 {
 let None = select_and_iterate(&mut values, index, noop)
 else { panic!("Wrong result pattern") };
 } else {
 let index = index % len;
 let mut frequencies = HashMap::new();
 let action = |x: &i8| {
 let frequency = frequencies.entry(*x).or_insert(0);
 *frequency += 1;
 };
 let Some(value_index) = select_and_iterate(&mut values, index, action)
 else { panic!() };
 let value = values[value_index];
 values.sort();
 prop_assert_eq!(value, values[index]);
 for (value, frequency) in frequencies {
 prop_assert!(values.contains(&value));
 prop_assert!(frequency <= len);
 }
 }
 }
 #[test]
 fn proptest_select_singleton_vec(value in i128::MIN..i128::MAX) {
 let mut counter = 0;
 let action = |_: &i128| {
 counter += 1;
 };
 let mut values = vec![value];
 let Some(0) = select_and_iterate(&mut values, 0, action)
 else { panic!("Wrong result pattern") };
 prop_assert_eq!(counter, 1);
 }
}

Questions

In particular, I’m interested in these questions:

  • Is this code actually notably faster than the sorting method? In my tests it seemed to perform O(n), as Wikipedia suggested
  • Does the use of a closure and passing noop to later iterations slow down the code notably?
  • Is the structure and organization of the code sensible?
  • Is the Clone requirement for median_and_mode sensible? I ran into issues with the borrow checker, which I still don’t fully understand, and used Clone, but I’m not sure if it’s a sensible requirement
  • Is the Clone requirement for select_and_iterate sensible? It comes from the call to clone in pivot, where I assume it’s because the compiler must assume that the mutable borrow in swap might result in a dangling pointer. Would this be a good place to use unsafe? Or is there some hack I’m not aware of that might circumvent this need? I assume it’s not that big of a deal as any other use of the same code could just pass references to be compared, as is done here, too, to avoid median_and_mode taking a mutable reference
lang-rust

AltStyle によって変換されたページ (->オリジナル) /