I've just started learning Rust and put together this working MergeSort implementation. Wondering what I've overlooked and what I could improve.
I couldn't figure out a cleaner way to deal with the Option
type returned by the iterator than just checking that values weren't None
and then .unwrap()
ing my results but I'd really appreciate any suggestions.
Included working test cases. Let me know if there's a case I missed.
pub fn merge_sort(mut input: Vec<usize>) -> Vec<usize> {
let length = input.len();
// Base case
if length < 2 {
return input;
}
let right = merge_sort(input.split_off(length / 2));
let left = merge_sort(input);
let mut left_iter = left.iter().peekable();
let mut right_iter = right.iter().peekable();
let mut merged = Vec::new();
let mut i = 0;
while i < length {
if left_iter.peek() > right_iter.peek() && right_iter.peek() != None {
merged.push(*(right_iter.next().unwrap()));
} else if left_iter.peek() != None {
merged.push(*(left_iter.next().unwrap()));
}
i += 1;
}
merged
}
#[cfg(test)]
mod tests {
use merge_sort;
#[test]
fn it_works() {
assert_eq!(merge_sort(vec![4, 3]), vec![3, 4]);
}
#[test]
fn test_base_case() {
assert_eq!(merge_sort(vec![3]), vec![3]);
}
#[test]
fn totally_backwards() {
assert_eq!(
merge_sort(vec![100, 90, 50, 14, 9, 7, 3]),
vec![3, 7, 9, 14, 50, 90, 100]
);
}
}
2 Answers 2
Disclaimer: I'm also new to Rust, but I have a background in C, C++ and Haskell. Take everything I say with a grain of salt.
All of that looks reasonable, except for the while
loop and the ownership. And there's a bug.
Bug on sorted vectors
Did you try your code on sorted sets?
#[test]
fn it_works_on_sorted() {
assert_eq!(merge_sort(vec![1, 2, 3, 4]), vec![1, 2, 3, 4]);
}
Your code won't work. Instead, you'll end up with vec![1]
. That's because
left_iter.peek() > right_iter.peek()
is false
as soon as we run out of elements on the left hand side, since
None < Some(x)
is always true for all x
. But in this case we won't add the elements from the right hand side! Instead, we look at our now empty left list:
if left_iter.peek() != None {
// left_iter.peek() is None, not executed
merged.push(*(left_iter.next().unwrap()));
}
// ALWAYS executed
i += 1;
Since i += 1
is always executed, we end up with i == length
, although we never actually push
ed the right-hand side. So i < length
hides a bug. The next section shows how to get rid of that one:
While
Why do we have to keep track of the length of our vector? We have iterators at hand, so that shouldn't be necessary, right? When either of our iterators is at end, we fill our vector with the rest.
That's rather easy:
while let (Some(&x), Some(&y)) = (left_iter.peek(), right_iter.peek()) {
if *x <= *y {
merged.push(*(left_iter.next().unwrap()))
} else {
merged.push(*(right_iter.next().unwrap()))
}
}
for x in left_iter {
merged.push(*x)
}
for y in right_iter {
merged.push(*y)
}
Vector::new()
vs Vector::with_capacity
We already know that merged
will have length
elements, therefore we should use Vector::with_capacity
instead:
let mut merged = Vec::with_capacity(length);
Ownership
merge_sort
could take a slice instead of a vector. That way it's more general. The actual implementation is left as an exercise, but it's more or less the same:
/// Sorts a slice in-place with mergesort.
///
/// ```
/// let mut example = [1,4,2,5,3];
///
/// merge_sort_inplace(example)
/// assert_eq!(example, [1,2,3,4,5]);
/// ```
pub fn merge_sort_inplace(input: &mut [usize]) {
let length = input.len();
if length < 2 {
return;
}
let (left, right) = input.split_at_mut(length / 2);
merge_sort_inplace(left);
merge_sort_inplace(right);
// ... similar to your variant
}
Note that the name is slightly a misnomer, merge_sort_inplace
will still need \$\mathcal \Theta(n)\$ additional memory unless you pull some tricks.
We can even re-use that variant if you still want to take ownership of the vector:
fn merge_sort(mut input: Vec<usize>) -> Vec<usize> {
merge_sort_inplace(&mut input);
input
}
-
\$\begingroup\$ Since it's not obvious from your sample code, I should note that your
merge_sort_inplace
will still need to allocate a temporary array to store the merge results (or, alternatively, to copy the halves of the input before sorting them, and then merge them back into the original array). There are ways around that, sort of, but they're not simple and they still require some extra work space for temporary results. \$\endgroup\$Ilmari Karonen– Ilmari Karonen2018年03月11日 13:24:47 +00:00Commented Mar 11, 2018 at 13:24 -
\$\begingroup\$ @IlmariKaronen I've added a remark, but I didn't mean
merge_sort_inplace
to be completely in-place, just the result to get written back into the&mut input
. \$\endgroup\$Zeta– Zeta2018年03月11日 19:00:13 +00:00Commented Mar 11, 2018 at 19:00 -
\$\begingroup\$ Thanks for your feedback. I was able to get a final implementation with slices. The most interesting thing I had to solve here was creating the
merged
Vec outside a new scope so that the borrowing of ownership used when creatingleft
andright
would expire. Not an original idea, I borrowed it from this implementation \$\endgroup\$Christopher Zehner– Christopher Zehner2018年03月12日 02:55:36 +00:00Commented Mar 12, 2018 at 2:55 -
1\$\begingroup\$ @ChristopherZehner good job. I would put the initialization of
merged
after sorting both sides, though, e.g. just writelet mut merged;
and later writemerged = Vec::with_capacity(length);
after you sorted the left and right side. That way, you will use at most one copy of your array, whereas allocationmerged
before you sort the subranges leads to (almost) two copies. \$\endgroup\$Zeta– Zeta2018年03月12日 06:33:01 +00:00Commented Mar 12, 2018 at 6:33
The one obvious thing I see is that you are consuming the input. I'd suggest you try using slices instead, so that you don't consume the inputs and you don't have to spend time and memory splitting the vector.
-
\$\begingroup\$ Consuming the input might still be necessary when extending this vector to support types that are not
Copy
, e.g. sorting aVec<String>
. I need to consume the input so that the elements can be moved to the output Vec. I don't think even a&mut [T]
mutable slice would be sufficient? Of course, this is irrelevant for aVec<usize>
where operating on slices would be much more elegant. \$\endgroup\$amon– amon2018年03月11日 11:41:53 +00:00Commented Mar 11, 2018 at 11:41
beta
.)) \$\endgroup\$