Idiomatic use of `Option` and `unwrap` in mean, median and mode Rust program for Chapter 8 of The Book

Question 1

I've recently finished Chapter 8 of The Book and have started doing the exercises at the end of the chapter. This post pertains to the first exercise which involves writing a program to output the averages of a list of integers. I thought the instructions for this particular exercise was a bit vague, so I tried to make it simple by not including REPL functionality in my implementation. Anyway, here's my current solution:


// Given a list of integers, use a vector and return the mean (the average
// value), median (when sorted, the value in the middle position), and
// mode (the value that occurs most often; a hash map will be helpful
// here) of the list.
use std::collections::HashMap;
struct Averages {
 mean: f64,
 median: f64,
 mode: Vec<i32>
}
fn sum(vector: &Vec<i32>) -> Option<i32> {
 match vector.len() {
 0 => None,
 _ => {
 let mut accumulator = 0;
 for element in vector {
 accumulator += element;
 }
 Some(accumulator)
 }
 }
}
fn mean(vector: &Vec<i32>) -> Option<f64> {
 match vector.len() {
 0 => None,
 length =>
 Some((sum(vector).unwrap() as f64) / (length as f64))
 }
}
fn is_even(number: &usize) -> bool {
 number % 2 == 0
}
fn median(vector_original: &Vec<i32>) -> Option<f64> {
 match vector_original.len() {
 0 => None,
 length => {
 let mut vector_cloned = vector_original.to_vec();
 vector_cloned.sort_unstable();
 match is_even(&length) {
 true => {
 let length_halved = length / 2;
 Some(((
 &vector_cloned[length_halved - 1] +
 &vector_cloned[length_halved]) as f64) / 2.0)
 },
 false =>
 Some(vector_cloned[((
 (length as f64) /
 2.0).floor() as usize)] as f64)
 }
 }
 }
}
fn mode(vector: &Vec<i32>) -> Option<Vec<i32>> {
 match vector.len() {
 0 => None,
 _ => {
 let mut counts = HashMap::new();
 for number in vector {
 let count = counts.entry(number).or_insert(1u32);
 *count += 1u32;
 }
 let mut max_count = 0;
 for count in counts.values() {
 if count > &max_count {
 max_count = *count;
 }
 }
 let mut modes: Vec<i32> = Vec::new();
 for (number, count) in &counts {
 if count == &max_count {
 modes.push(**number);
 }
 }
 Some(modes)
 }
 }
}
fn get_averages(vector: &Vec<i32>) -> Option<Averages> {
 match vector.len() {
 0 => None,
 _ => Some(
 Averages {
 mean: mean(vector).unwrap(),
 median: median(vector).unwrap(),
 mode: mode(vector).unwrap()
 }
 )
 }
}
fn print_averages(vector: &Vec<i32>) {
 match vector.len() {
 0 => println!("Vector is empty."),
 _ => {
 let averages: Averages = get_averages(vector).unwrap();
 println!(
"Mean: {}
Median: {}
Mode(s): {:?}",
 &averages.mean,
 &averages.median,
 &averages.mode
 );
 }
 }
}
fn main() {
 print_averages(&(10..=20).collect());
 print_averages(&vec![5,5,5,4,3,2,1,1,1]);
 print_averages(&vec![]);
}

I wrote the current implementation like I did as I was trying to write it such that each of the functions are reusable independent of each other and in contexts possibly outside of this particular program (hence the arguably redundant calls to len). But now I'm wondering if using so many unwraps like I did is the idiomatic way to handle the Options or if there's a better way to have written this instead. I'm especially worried that my implementation might be brittle because I encountered a couple of panics when I was writing and testing it.

I'm also unsure if I'm doing borrowing and moving, as well as referencing and dereferencing optimally, especially in the mode function; despite it working properly, I'm a little confused as to what's its actually doing with the data and references, so some explanation about it would be really welcome.

Any other constructive criticisms in general are also welcome, so please nitpick to your hearts content!

Question 2

Read 6005's answer first — the most important issues are covered there. Here are some additions:

Formatting

rustfmt formats your code according to the official Rust Style Guide, which most Rustaceans are familiar with, to ease communication.

Argument passing

As 6005 pointed out, make your parameters &[i32] instead of Vec<i32>. See Why is it discouraged to accept a reference to a String (&String), Vec (&Vec), or Box (&Box) as a function argument?

There is an exception: for median, you clone the numbers and sort the clone. If the caller has a disposable collection that will not be used after calculating the median, this would be wasteful. I suggest taking a Vec by value here:

fn median(mut numbers: Vec<i32>) -> Option<f64> {
 // ...
 numbers.sort_unstable();
 // ...
}

For Copy types like usize, there is no need to take a reference. Simply pass the usize instead.

`mean`

Since your sum checks for empty collections already, you can use the result rather than check again in mean:

fn mean(numbers: &[i32]) -> Option<f64> {
 (sum(vector)? as f64) / (numbers.len() as f64)
}

where the ? operator automatically unwraps the Option if it is Some and returns None from the enclosing function otherwise.

Arithmetic

This is an unnecessary use of floating point arithmetic:

(((length as f64) / 2.0).floor() as usize)

Integer division rounds toward zero, so what you want is simply length / 2.

`None` vs `NaN`

Since you're already returning f64 for mean and median, an alternative to using Option might be to simply return f64::NAN when the collection is empty, as proper f64 arithmetic should be NAN-aware anyway. None and NAN both have their pros and cons. Personally, I like to use NAN to simplify the type, but the decision is up to you.

Control flow

In median, too many levels of indentation make the code less readable. The matches can be eliminated using if and return.

Combining the previous points, here's roughly how I would write median:

fn median(numbers: Vec<i32>) -> f64 {
 let length = numbers.len();
 if length == 0 {
 return f64::NAN;
 }
 numbers.sort_unstable();
 if is_even(length) {
 let left = numbers[length / 2];
 let right = numbers[length / 2 + 1];
 (left as f64 + right as f64) / 2.0
 } else {
 numbers[length / 2] as f64
 }
}

which hopefully looks simpler.

`mode`

usize should be a more natural choice for the value type of the map.

There are some functionalities from the standard library that you can use to simplify the code, including max, filter, and map:

fn mode(numbers: &[i32]) -> Option<Vec<i32>> {
 let freqs = HashMap::<i32, usize>::new();
 for number in numbers {
 freqs.entry(number).or_insert(0) += 1;
 }
 let max_count = freqs.values().max()?;
 let mode = freqs
 .iter()
 .filter(|(_, &count)| count == max_count)
 .map(|(&number, _)| number)
 .collect();
 Some(mode)
}

`get_averages` & `print_averages`

Empty collections aren't common — I wouldn't waste two branches on them. Simply let the code run through:

fn get_averages(numbers: Vec<i32>) -> Option<Averages> {
 Some(Averages {
 mean: mean(vector)?,
 median: median(vector)?,
 mode: mode(vector)?,
 })
}

The same applies to print_averages — you're wasting three branches in total for non-empty collections. Personally, I think the multiline string literal disrupts the aesthetics of the code, so I got:

fn print_averages(numbers: Vec<i32>) {
 match get_averages(numbers) {
 None => println!("Empty collection"),
 Some(Average { mean, median, mode }) => {
 println!("Mean: {}", mean);
 println!("Median: {}", median);
 println!("Modes: {:?}", mode);
 }
 };
}

Doc comments & tests

As a bonus, you can add doc comments and tests to document your code if you have spare time.

Question 3

Good answer, nice use of ? which I didn't touch on

Question 4

@6005 It seems that we are the only active Rust reviewers now. Let's keep up the good work :)

Question 5

Good job in separating what you did into self-contained functions. Some comments on design and style:

First things first: always run Clippy! (cargo clippy) Its feedback is usually helpful. Here it notices a few problems:
- Whenever you have a function with &Vec<T> as an argument, it should almost always be rewritten as taking a slice &[T]. Think of a slice as a more general vector-like memory, so a function which accepts &[T] is more general than one that accepts &Vec<T>. This also has the side effect of changing your main function a bit:
```
 fn main() {
 let v: Vec<i32> = (10..=20).collect();
 print_averages(&v);
 print_averages(&[5, 5, 5, 4, 3, 2, 1, 1, 1]);
 print_averages(&[]);
 }
```
- In the line ((&vector_cloned[length_halved - 1] + &vector_cloned[length_halved]), you can remove both ampersands.
In your sum function, returning 0 for the sum of an empty list (so, fn sum(vector: &[i32]) -> i32) is cleaner and more standard. Also, there is a built-in for this function: vector.iter().sum()
Your mean function is clear, the use of match vector.len() is clever (the alternative is if vector.is_empty() {, but then you still have to call .len() again in the else branch. So I like your design choice better.)
However in contrast, in mode and print_averages, here it is clearer to use if vector.is_empty() { since you aren't using the length in the logic.
Relatedly, in median when you use match is_even(&length), here I think it is clearer to use an if statement, since there is no value associated with one of the match branches that you need access to.
Also in median I think some of the logic can be clearer using intermediate variables instead of complex expressions. In particular:
```
 let ele1 = vector_cloned[length_halved - 1];
 let ele2 = vector_cloned[length_halved];
 Some(((ele1 + ele2) as f64) / 2.0)
```

To answer your specific questions:

But now I'm wondering if using so many unwraps like I did is the idiomatic way to handle the Options or if there's a better way to have written this instead. I'm especially worried that my implementation might be brittle because I encountered a couple of panics when I was writing and testing it.

Using unwraps here is actually pretty justified: you statically know that these vectors are nonempty so your unwraps should never panic. You are right to be concerned though as normally unwraps are best avoided. If you prefer you can do .expect("unreachable") to indicate what exactly went wrong.

I'm also unsure if I'm doing borrowing and moving, as well as referencing and dereferencing optimally, especially in the mode function.

Edited to add: some comments on the mode function:

Some improvements to your use of references:

You created a HashMap<&i32, u32>; this leads to some weird dereferencing in the code. It would be better to create a HashMap<i32, u32>. You can make sure you are doing it correctly by using a type annotation: let mut counts: HashMap<i32, u32> = HashMap::new();
For the first for loop, you can use .and_modify to simplify the logic: (note vector is a reference / slice):

 let mut counts: HashMap<i32, u32> = HashMap::new();
 for &number in vector {
 counts.entry(number).and_modify(|count| *count += 1).or_insert(1u32);
 }

The second for loop can be avoided using .max() directly on iterators:

let max_count: u32 = *counts.values().max().unwrap()

Once you have a HashMap<i32, u32>, the third for loop becomes much nicer. You can avoid references completely by consuming the hashmap in the for loop:

 for (number, count) in counts {
 if count == max_count {
 modes.push(number);
 }
 }

But the most common way is to iterate by reference, which would normally be written this way:

 for (&number, &count) in &counts {
 if count == max_count {
 modes.push(number);
 }
 }

Or equivalently if you prefer:

 for (number, count) in &counts {
 if *count == max_count {
 modes.push(*number);
 }
 }

You can also do the last loop with a filter instead of an explicit for loop, though when learning it's always good to write out for loops for practice.
Also for larger projects, ultimately the best way to do this would probably be to use a dedicated multiset collection, for instance the multiset crate. The mode body should be really pretty short, like let counts: MultiSet<i32> = vector.collect(); counts.iter().max_by_key(...).

L. F. L. F. 9,6952 gold badges27 silver badges69 bronze badges · Accepted Answer · 2021-02-07 03:50:24Z

Read 6005's answer first — the most important issues are covered there. Here are some additions:

Formatting

rustfmt formats your code according to the official Rust Style Guide, which most Rustaceans are familiar with, to ease communication.

Argument passing

As 6005 pointed out, make your parameters &[i32] instead of Vec<i32>. See Why is it discouraged to accept a reference to a String (&String), Vec (&Vec), or Box (&Box) as a function argument?

There is an exception: for median, you clone the numbers and sort the clone. If the caller has a disposable collection that will not be used after calculating the median, this would be wasteful. I suggest taking a Vec by value here:

fn median(mut numbers: Vec<i32>) -> Option<f64> {
 // ...
 numbers.sort_unstable();
 // ...
}

For Copy types like usize, there is no need to take a reference. Simply pass the usize instead.

`mean`

Since your sum checks for empty collections already, you can use the result rather than check again in mean:

fn mean(numbers: &[i32]) -> Option<f64> {
 (sum(vector)? as f64) / (numbers.len() as f64)
}

where the ? operator automatically unwraps the Option if it is Some and returns None from the enclosing function otherwise.

Arithmetic

This is an unnecessary use of floating point arithmetic:

(((length as f64) / 2.0).floor() as usize)

Integer division rounds toward zero, so what you want is simply length / 2.

`None` vs `NaN`

Since you're already returning f64 for mean and median, an alternative to using Option might be to simply return f64::NAN when the collection is empty, as proper f64 arithmetic should be NAN-aware anyway. None and NAN both have their pros and cons. Personally, I like to use NAN to simplify the type, but the decision is up to you.

Control flow

In median, too many levels of indentation make the code less readable. The matches can be eliminated using if and return.

Combining the previous points, here's roughly how I would write median:

fn median(numbers: Vec<i32>) -> f64 {
 let length = numbers.len();
 if length == 0 {
 return f64::NAN;
 }
 numbers.sort_unstable();
 if is_even(length) {
 let left = numbers[length / 2];
 let right = numbers[length / 2 + 1];
 (left as f64 + right as f64) / 2.0
 } else {
 numbers[length / 2] as f64
 }
}

which hopefully looks simpler.

`mode`

usize should be a more natural choice for the value type of the map.

There are some functionalities from the standard library that you can use to simplify the code, including max, filter, and map:

fn mode(numbers: &[i32]) -> Option<Vec<i32>> {
 let freqs = HashMap::<i32, usize>::new();
 for number in numbers {
 freqs.entry(number).or_insert(0) += 1;
 }
 let max_count = freqs.values().max()?;
 let mode = freqs
 .iter()
 .filter(|(_, &count)| count == max_count)
 .map(|(&number, _)| number)
 .collect();
 Some(mode)
}

`get_averages` & `print_averages`

Empty collections aren't common — I wouldn't waste two branches on them. Simply let the code run through:

fn get_averages(numbers: Vec<i32>) -> Option<Averages> {
 Some(Averages {
 mean: mean(vector)?,
 median: median(vector)?,
 mode: mode(vector)?,
 })
}

The same applies to print_averages — you're wasting three branches in total for non-empty collections. Personally, I think the multiline string literal disrupts the aesthetics of the code, so I got:

fn print_averages(numbers: Vec<i32>) {
 match get_averages(numbers) {
 None => println!("Empty collection"),
 Some(Average { mean, median, mode }) => {
 println!("Mean: {}", mean);
 println!("Median: {}", median);
 println!("Modes: {:?}", mode);
 }
 };
}

Doc comments & tests

As a bonus, you can add doc comments and tests to document your code if you have spare time.

@6005 It seems that we are the only active Rust reviewers now. Let's keep up the good work :)

Stack Exchange Network

Idiomatic use of `Option` and `unwrap` in mean, median and mode Rust program for Chapter 8 of The Book

2 Answers 2

Formatting

Argument passing

`mean`

Arithmetic

`None` vs `NaN`

Control flow

`mode`

`get_averages` & `print_averages`

Doc comments & tests

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Idiomatic use of `Option` and `unwrap` in mean, median and mode Rust program for Chapter 8 of The Book

2 Answers 2

Formatting

Argument passing

mean

Arithmetic

None vs NaN

Control flow

mode

get_averages & print_averages

Doc comments & tests

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

`mean`

`None` vs `NaN`

`mode`

`get_averages` & `print_averages`