For fun, I decided to implement the quickest Quicksort I could. This is what I came up with. How can I make it better?
I know that it's slower for sorted and already reversed lists (through testing), but I'm not certain why. What I do know is that this algorithm reverses the list past the pivot, if the list was sorted/reversed to begin with, but I'm not sure why that would slow it down as much as it does.
#pragma once
#include "InsertionSort.h" // assume this has a good implementation of Insertion Sort
namespace QuickSort
{
template<typename iterator>
iterator medianOfThree(iterator low, iterator high) {
iterator mid = low + (high - low) / 2; // get an iterator to the middle index
if (*low <= *mid and *mid <= *high) return mid;
if (*high <= *mid and *mid <= *low) return mid;
if (*mid <= *low and *low <= *high) return low;
if (*high <= *low and *low <= *mid) return low;
return high;
}
/*
* Sorts the list given by the two iterators (begin and end) by virtue of
* Quicksort.
*/
template<typename iterator>
void sort(iterator begin, iterator end) {
// the loop handles one of the partitions, guaranteeing "tail-call" optimization for the larger
// partition. The smaller partition goes into the recursive step
while (end - begin > 1) {
if (end - begin < 20) { // this is supposed to make it faster for small lists
InsertionSort::sort(begin, end);
return;
}
auto value = *medianOfThree(begin, end - 1);
iterator i = begin; // i marks the top of the "smaller" partition
iterator j = begin; // j marks the bottom of the "larger" partition
iterator n = end - 1; // n allows us to swap the larger values to the end
// This partitioning algorithm allows us to partition into 3 groups:
// smaller, equal, larger.
// This makes it so that with large amounts of repeated values, we are faster.
while (j <= n) {
if (*j < value) {
std::iter_swap(i, j);
++i;
++j;
} else if (value < *j) {
std::iter_swap(j, n);
--n;
} else {
++j;
}
}
// at this point, i marks the end of the "smaller" partition
// and j marks the beginning of the "larger" partition.
// Note that between them resides values equal to the pivot.
int size1 = i - begin; // this is the size of the "smaller" partition
int size2 = end - j; // this is the size of the "larger" partition
if (size1 < size2) {
QuickSort::sort(begin, i); // recurse on the smaller partition
begin = j; // this partition is handled by the next iteration of the loop
} else {
QuickSort::sort(j, end);
end = i;
}
}
}
}
When I tested this code, this is what I got:
Sorting a sorted vector of size 1000000 Seconds: 1.159602842 Sorting a sorted vector of size 1500000 Seconds: 1.160466839 Sorting a sorted vector of size 2000000 Seconds: 2.269401411 Sorting a reversed vector of size 1000000 Seconds: 0.551178608 Sorting a reversed vector of size 1500000 Seconds: 0.828606600 Sorting a reversed vector of size 2000000 Seconds: 1.263003406 Sorting a random vector of size 1000000 Seconds: 0.099294498 Sorting a random vector of size 1500000 Seconds: 0.151264950 Sorting a random vector of size 2000000 Seconds: 0.210116894 Sorting a binary random vector of size 1000000 Seconds: 0.006512821 Sorting a binary random vector of size 1500000 Seconds: 0.009594559 Sorting a binary random vector of size 2000000 Seconds: 0.013067604 Sorting a small random vector of size 1000000 Seconds: 0.090787581 Sorting a small random vector of size 1500000 Seconds: 0.137927258 Sorting a small random vector of size 2000000 Seconds: 0.405081084
Where sorted
means consecutive integers from 0
to size - 1
, reversed
is the reverse of that, random
means calls to rand()
(I did not seed it), binary random
means rand() % 2
, and small random
means rand() % (size / 10)
.
1 Answer 1
It's been ages since I last understood quick sort, so unfortunately I don't have much to say on the implementation from an algorithmic point of view (though it does look good to me). I do have a few notes on style/technicalities though.
Style
- Is
value
the pivot? If so name itpivot
instead ofvalue
. Pretty much everything in a program is a 'value' so the name is meaningless. size1
andsize2
could use more meaningful names. In fact, with proper names, you won't need the comments for them.- Except for super obvious cases (i.e.
for (int i = ...)
), try to avoid one letter variable names. - Since you're doing ProperCase type names (at least I assume so from the namespace), the template params should be
Iterator
instead ofiterator
. - It's pretty obvious since it's quicksort, but I like to specify iterator type in template params like
RAIterator
. In cases where it's less obvious, it can be quicker to glance at than the docs. - I don't know if I would use the
and
operator. There's no harm in it, but it's highly unusual to actually see it. medianOfThree
could be rearranged so that it's a bit terser by nesting some of the second conditionals instead of having a line for each case. (You could probably also abusestd::min
andstd::max
, but meh... I think that'd just make it more complicated.)
Technicalities
Use type deduction instead of int
for the iterator differences. It's highly unlikely that you'll actually have a range larger than the capacity of an int
, but it's always better to be on the safe side.
Support for #pragma once
is pretty widespread now, but it's still not standard. It's completely acceptable to use it as long as you're aware of the trade offs, but deviation from the standard should never be taken lightly.
auto value = *medianOfThree(begin, end - 1);
Depending on the type of value, this could be incredibly expensive. Unfortunately, since the iter_swap
could change the value that value
would refer to if it were made a reference, changing this is non-trivial. Really, it's probably not worth doing unless copying is expected to be expensive, but it's still interesting to think about.
Basically what you'd need to do to avoid a copy is to have value
be an iterator instead of a value. Then when the iter_swap
calls are made, you would need to check if either of the iterators being swapped is equal to value
. If one were equal, you would need to do the swap, then make value
be an iterator to the other one (so value
still pointed at the correct value).
This could actually be wrapped in a fairly pleasant function if you wanted go this route:
template<typename Iter>
void iter_swap_maintain(Iter swap_a, Iter swap_b, Iter& iter) {
std::iter_swap(swap_a, swap_b);
if (iter == swap_a) {
iter = swap_b;
} else if (set == swap_b) {
iter = swap_a;
}
}
In the case of cheap copies, this would of course be (potentially very significantly) slower than just copying value
and using iter_swap
directly.
-
\$\begingroup\$ Strangely, just doing
const auto& value = ...
makes the sort fail, but if I instead change the return type ofmedianOfThree
toauto
and have it return*low
,*mid
, or*high
, it works. \$\endgroup\$Justin– Justin2015年04月04日 04:02:04 +00:00Commented Apr 4, 2015 at 4:02 -
\$\begingroup\$ @Quincunx That will still create a copy, so there's not anything to gain there in terms of performance. I just realized that I was wrong in saying "you don't access it after its location has potentially changed." Unfortunately, if one of the
iter_swap
s changes the value at the locationvalue
points to, then the sort will indeed break. It would seem the copy is unavoidable. Or rather, unless you want to make value be an iterator instead of a value and update it appropriately ifiter_swap
is going to be applied to the item at which it points. \$\endgroup\$Corbin– Corbin2015年04月04日 04:06:32 +00:00Commented Apr 4, 2015 at 4:06
%
is. \$\endgroup\$