I've recently finished Chapter 8 of The Book and have started doing the exercises at the end of the chapter. This post pertains to the first exercise which involves writing a program to output the averages of a list of integers. I thought the instructions for this particular exercise was a bit vague, so I tried to make it simple by not including REPL functionality in my implementation. Anyway, here's my current solution:
// Given a list of integers, use a vector and return the mean (the average
// value), median (when sorted, the value in the middle position), and
// mode (the value that occurs most often; a hash map will be helpful
// here) of the list.
use std::collections::HashMap;
struct Averages {
mean: f64,
median: f64,
mode: Vec<i32>
}
fn sum(vector: &Vec<i32>) -> Option<i32> {
match vector.len() {
0 => None,
_ => {
let mut accumulator = 0;
for element in vector {
accumulator += element;
}
Some(accumulator)
}
}
}
fn mean(vector: &Vec<i32>) -> Option<f64> {
match vector.len() {
0 => None,
length =>
Some((sum(vector).unwrap() as f64) / (length as f64))
}
}
fn is_even(number: &usize) -> bool {
number % 2 == 0
}
fn median(vector_original: &Vec<i32>) -> Option<f64> {
match vector_original.len() {
0 => None,
length => {
let mut vector_cloned = vector_original.to_vec();
vector_cloned.sort_unstable();
match is_even(&length) {
true => {
let length_halved = length / 2;
Some(((
&vector_cloned[length_halved - 1] +
&vector_cloned[length_halved]) as f64) / 2.0)
},
false =>
Some(vector_cloned[((
(length as f64) /
2.0).floor() as usize)] as f64)
}
}
}
}
fn mode(vector: &Vec<i32>) -> Option<Vec<i32>> {
match vector.len() {
0 => None,
_ => {
let mut counts = HashMap::new();
for number in vector {
let count = counts.entry(number).or_insert(1u32);
*count += 1u32;
}
let mut max_count = 0;
for count in counts.values() {
if count > &max_count {
max_count = *count;
}
}
let mut modes: Vec<i32> = Vec::new();
for (number, count) in &counts {
if count == &max_count {
modes.push(**number);
}
}
Some(modes)
}
}
}
fn get_averages(vector: &Vec<i32>) -> Option<Averages> {
match vector.len() {
0 => None,
_ => Some(
Averages {
mean: mean(vector).unwrap(),
median: median(vector).unwrap(),
mode: mode(vector).unwrap()
}
)
}
}
fn print_averages(vector: &Vec<i32>) {
match vector.len() {
0 => println!("Vector is empty."),
_ => {
let averages: Averages = get_averages(vector).unwrap();
println!(
"Mean: {}
Median: {}
Mode(s): {:?}",
&averages.mean,
&averages.median,
&averages.mode
);
}
}
}
fn main() {
print_averages(&(10..=20).collect());
print_averages(&vec![5,5,5,4,3,2,1,1,1]);
print_averages(&vec![]);
}
I wrote the current implementation like I did as I was trying to write it such that each of the functions are reusable independent of each other and in contexts possibly outside of this particular program (hence the arguably redundant calls to len
). But now I'm wondering if using so many unwrap
s like I did is the idiomatic way to handle the Option
s or if there's a better way to have written this instead. I'm especially worried that my implementation might be brittle because I encountered a couple of panics when I was writing and testing it.
I'm also unsure if I'm doing borrowing and moving, as well as referencing and dereferencing optimally, especially in the mode
function; despite it working properly, I'm a little confused
as to what's its actually doing with the data and references, so some explanation about it would be really welcome.
Any other constructive criticisms in general are also welcome, so please nitpick to your hearts content!
2 Answers 2
Read 6005's answer first — the most important issues are covered there. Here are some additions:
Formatting
rustfmt
formats your code according to the official Rust Style
Guide, which most Rustaceans are familiar with, to ease
communication.
Argument passing
As 6005 pointed out, make your parameters &[i32]
instead of
Vec<i32>
. See Why is it discouraged to accept a reference to a
String
(&String
), Vec
(&Vec
), or Box
(&Box
) as a function
argument?
There is an exception: for median
, you clone the numbers and sort
the clone. If the caller has a disposable collection that will not
be used after calculating the median, this would be wasteful. I
suggest taking a Vec
by value here:
fn median(mut numbers: Vec<i32>) -> Option<f64> {
// ...
numbers.sort_unstable();
// ...
}
For Copy
types like usize
, there is no need to take a reference.
Simply pass the usize
instead.
mean
Since your sum
checks for empty collections already, you can use the
result rather than check again in mean
:
fn mean(numbers: &[i32]) -> Option<f64> {
(sum(vector)? as f64) / (numbers.len() as f64)
}
where the ?
operator automatically unwraps the Option
if it
is Some
and returns None
from the enclosing function otherwise.
Arithmetic
This is an unnecessary use of floating point arithmetic:
(((length as f64) / 2.0).floor() as usize)
Integer division rounds toward zero, so what you want is simply
length / 2
.
None
vs NaN
Since you're already returning f64
for mean
and median
, an
alternative to using Option
might be to simply return f64::NAN
when the collection is empty, as proper f64
arithmetic should be
NAN
-aware anyway. None
and NAN
both have their pros and cons.
Personally, I like to use NAN
to simplify the type, but the decision
is up to you.
Control flow
In median
, too many levels of indentation make the code less
readable. The match
es can be eliminated using if
and return
.
Combining the previous points, here's roughly how I would write
median
:
fn median(numbers: Vec<i32>) -> f64 {
let length = numbers.len();
if length == 0 {
return f64::NAN;
}
numbers.sort_unstable();
if is_even(length) {
let left = numbers[length / 2];
let right = numbers[length / 2 + 1];
(left as f64 + right as f64) / 2.0
} else {
numbers[length / 2] as f64
}
}
which hopefully looks simpler.
mode
usize
should be a more natural choice for the value type of the map.
There are some functionalities from the standard library that you can
use to simplify the code, including max
, filter
, and
map
:
fn mode(numbers: &[i32]) -> Option<Vec<i32>> {
let freqs = HashMap::<i32, usize>::new();
for number in numbers {
freqs.entry(number).or_insert(0) += 1;
}
let max_count = freqs.values().max()?;
let mode = freqs
.iter()
.filter(|(_, &count)| count == max_count)
.map(|(&number, _)| number)
.collect();
Some(mode)
}
get_averages
& print_averages
Empty collections aren't common — I wouldn't waste two branches on them. Simply let the code run through:
fn get_averages(numbers: Vec<i32>) -> Option<Averages> {
Some(Averages {
mean: mean(vector)?,
median: median(vector)?,
mode: mode(vector)?,
})
}
The same applies to print_averages
— you're wasting three
branches in total for non-empty collections. Personally, I think the
multiline string literal disrupts the aesthetics of the code, so I
got:
fn print_averages(numbers: Vec<i32>) {
match get_averages(numbers) {
None => println!("Empty collection"),
Some(Average { mean, median, mode }) => {
println!("Mean: {}", mean);
println!("Median: {}", median);
println!("Modes: {:?}", mode);
}
};
}
Doc comments & tests
As a bonus, you can add doc comments and tests to document your code if you have spare time.
-
\$\begingroup\$ Good answer, nice use of
?
which I didn't touch on \$\endgroup\$Caleb Stanford– Caleb Stanford2021年02月07日 05:06:54 +00:00Commented Feb 7, 2021 at 5:06 -
1\$\begingroup\$ @6005 It seems that we are the only active Rust reviewers now. Let's keep up the good work :) \$\endgroup\$L. F.– L. F.2021年02月07日 05:11:43 +00:00Commented Feb 7, 2021 at 5:11
Good job in separating what you did into self-contained functions. Some comments on design and style:
First things first: always run Clippy! (
cargo clippy
) Its feedback is usually helpful. Here it notices a few problems:Whenever you have a function with
&Vec<T>
as an argument, it should almost always be rewritten as taking a slice&[T]
. Think of a slice as a more general vector-like memory, so a function which accepts&[T]
is more general than one that accepts&Vec<T>
. This also has the side effect of changing your main function a bit:fn main() { let v: Vec<i32> = (10..=20).collect(); print_averages(&v); print_averages(&[5, 5, 5, 4, 3, 2, 1, 1, 1]); print_averages(&[]); }
In the line
((&vector_cloned[length_halved - 1] + &vector_cloned[length_halved])
, you can remove both ampersands.
In your
sum
function, returning0
for the sum of an empty list (so,fn sum(vector: &[i32]) -> i32
) is cleaner and more standard. Also, there is a built-in for this function:vector.iter().sum()
Your
mean
function is clear, the use ofmatch vector.len()
is clever (the alternative isif vector.is_empty() {
, but then you still have to call.len()
again in the else branch. So I like your design choice better.)However in contrast, in
mode
andprint_averages
, here it is clearer to useif vector.is_empty() {
since you aren't using the length in the logic.Relatedly, in
median
when you usematch is_even(&length)
, here I think it is clearer to use an if statement, since there is no value associated with one of the match branches that you need access to.Also in
median
I think some of the logic can be clearer using intermediate variables instead of complex expressions. In particular:let ele1 = vector_cloned[length_halved - 1]; let ele2 = vector_cloned[length_halved]; Some(((ele1 + ele2) as f64) / 2.0)
To answer your specific questions:
But now I'm wondering if using so many unwraps like I did is the idiomatic way to handle the Options or if there's a better way to have written this instead. I'm especially worried that my implementation might be brittle because I encountered a couple of panics when I was writing and testing it.
Using unwraps
here is actually pretty justified: you statically know that these vectors are nonempty so your unwrap
s should never panic. You are right to be concerned though as normally unwrap
s are best avoided. If you prefer you can do .expect("unreachable")
to indicate what exactly went wrong.
I'm also unsure if I'm doing borrowing and moving, as well as referencing and dereferencing optimally, especially in the mode function.
Edited to add: some comments on the mode function:
Some improvements to your use of references:
You created a
HashMap<&i32, u32>
; this leads to some weird dereferencing in the code. It would be better to create aHashMap<i32, u32>
. You can make sure you are doing it correctly by using a type annotation:let mut counts: HashMap<i32, u32> = HashMap::new();
For the first for loop, you can use
.and_modify
to simplify the logic: (note vector is a reference / slice):
let mut counts: HashMap<i32, u32> = HashMap::new();
for &number in vector {
counts.entry(number).and_modify(|count| *count += 1).or_insert(1u32);
}
- The second for loop can be avoided using
.max()
directly on iterators:
let max_count: u32 = *counts.values().max().unwrap()
- Once you have a
HashMap<i32, u32>
, the third for loop becomes much nicer. You can avoid references completely by consuming the hashmap in the for loop:
for (number, count) in counts {
if count == max_count {
modes.push(number);
}
}
But the most common way is to iterate by reference, which would normally be written this way:
for (&number, &count) in &counts {
if count == max_count {
modes.push(number);
}
}
Or equivalently if you prefer:
for (number, count) in &counts {
if *count == max_count {
modes.push(*number);
}
}
You can also do the last loop with a filter instead of an explicit for loop, though when learning it's always good to write out for loops for practice.
Also for larger projects, ultimately the best way to do this would probably be to use a dedicated multiset collection, for instance the
multiset
crate. The mode body should be really pretty short, likelet counts: MultiSet<i32> = vector.collect(); counts.iter().max_by_key(...)
.
Explore related questions
See similar questions with these tags.