Is this implementation of Quicksort good?

Question 1

For fun, I decided to implement the quickest Quicksort I could. This is what I came up with. How can I make it better?

I know that it's slower for sorted and already reversed lists (through testing), but I'm not certain why. What I do know is that this algorithm reverses the list past the pivot, if the list was sorted/reversed to begin with, but I'm not sure why that would slow it down as much as it does.

#pragma once
#include "InsertionSort.h" // assume this has a good implementation of Insertion Sort
namespace QuickSort
{
 template<typename iterator>
 iterator medianOfThree(iterator low, iterator high) {
 iterator mid = low + (high - low) / 2; // get an iterator to the middle index
 if (*low <= *mid and *mid <= *high) return mid;
 if (*high <= *mid and *mid <= *low) return mid;
 if (*mid <= *low and *low <= *high) return low;
 if (*high <= *low and *low <= *mid) return low;
 return high;
 }
 /*
 * Sorts the list given by the two iterators (begin and end) by virtue of
 * Quicksort.
 */
 template<typename iterator>
 void sort(iterator begin, iterator end) {
 // the loop handles one of the partitions, guaranteeing "tail-call" optimization for the larger
 // partition. The smaller partition goes into the recursive step
 while (end - begin > 1) {
 if (end - begin < 20) { // this is supposed to make it faster for small lists
 InsertionSort::sort(begin, end);
 return;
 }
 auto value = *medianOfThree(begin, end - 1);
 iterator i = begin; // i marks the top of the "smaller" partition
 iterator j = begin; // j marks the bottom of the "larger" partition
 iterator n = end - 1; // n allows us to swap the larger values to the end
 // This partitioning algorithm allows us to partition into 3 groups:
 // smaller, equal, larger.
 // This makes it so that with large amounts of repeated values, we are faster.
 while (j <= n) {
 if (*j < value) {
 std::iter_swap(i, j);
 ++i;
 ++j;
 } else if (value < *j) {
 std::iter_swap(j, n);
 --n;
 } else {
 ++j;
 }
 }
 // at this point, i marks the end of the "smaller" partition
 // and j marks the beginning of the "larger" partition.
 // Note that between them resides values equal to the pivot.
 int size1 = i - begin; // this is the size of the "smaller" partition
 int size2 = end - j; // this is the size of the "larger" partition
 if (size1 < size2) {
 QuickSort::sort(begin, i); // recurse on the smaller partition
 begin = j; // this partition is handled by the next iteration of the loop
 } else {
 QuickSort::sort(j, end);
 end = i;
 }
 }
 }
}

When I tested this code, this is what I got:

Sorting a sorted vector of size 1000000 Seconds: 1.159602842
Sorting a sorted vector of size 1500000 Seconds: 1.160466839
Sorting a sorted vector of size 2000000 Seconds: 2.269401411
Sorting a reversed vector of size 1000000 Seconds: 0.551178608
Sorting a reversed vector of size 1500000 Seconds: 0.828606600
Sorting a reversed vector of size 2000000 Seconds: 1.263003406
Sorting a random vector of size 1000000 Seconds: 0.099294498
Sorting a random vector of size 1500000 Seconds: 0.151264950
Sorting a random vector of size 2000000 Seconds: 0.210116894
Sorting a binary random vector of size 1000000 Seconds: 0.006512821
Sorting a binary random vector of size 1500000 Seconds: 0.009594559
Sorting a binary random vector of size 2000000 Seconds: 0.013067604
Sorting a small random vector of size 1000000 Seconds: 0.090787581
Sorting a small random vector of size 1500000 Seconds: 0.137927258
Sorting a small random vector of size 2000000 Seconds: 0.405081084

Where sorted means consecutive integers from 0 to size - 1, reversed is the reverse of that, random means calls to rand() (I did not seed it), binary random means rand() % 2, and small random means rand() % (size / 10).

Question 2

Amazingly, the binary random vector sorts extremely fast for huge sizes. At 100 million, it took about .6 of a second.

Question 3

That speed is faster than the STL sort, which took 2.8 seconds. Strangely, at 400 million, it takes 3.2 seconds, while the STL sort crashes.

Question 4

I do wonder a bit how you're timing things; maybe your small random performance is because you're timing how slow % is.

Question 5

@Hurkyl I could post the code for my tester, but the way it works is that I generate each of the vectors to sort, then I copy them for each sort, and I take the time immediately before and after the sort, then sort using the STL sort and check that the sort worked right.

Question 6

It's been ages since I last understood quick sort, so unfortunately I don't have much to say on the implementation from an algorithmic point of view (though it does look good to me). I do have a few notes on style/technicalities though.

Style

Is value the pivot? If so name it pivot instead of value. Pretty much everything in a program is a 'value' so the name is meaningless.
size1 and size2 could use more meaningful names. In fact, with proper names, you won't need the comments for them.
Except for super obvious cases (i.e. for (int i = ...)), try to avoid one letter variable names.
Since you're doing ProperCase type names (at least I assume so from the namespace), the template params should be Iterator instead of iterator.
It's pretty obvious since it's quicksort, but I like to specify iterator type in template params like RAIterator. In cases where it's less obvious, it can be quicker to glance at than the docs.
I don't know if I would use the and operator. There's no harm in it, but it's highly unusual to actually see it.
medianOfThree could be rearranged so that it's a bit terser by nesting some of the second conditionals instead of having a line for each case. (You could probably also abuse std::min and std::max, but meh... I think that'd just make it more complicated.)

Technicalities

Use type deduction instead of int for the iterator differences. It's highly unlikely that you'll actually have a range larger than the capacity of an int, but it's always better to be on the safe side.

Support for #pragma once is pretty widespread now, but it's still not standard. It's completely acceptable to use it as long as you're aware of the trade offs, but deviation from the standard should never be taken lightly.

auto value = *medianOfThree(begin, end - 1);

Depending on the type of value, this could be incredibly expensive. Unfortunately, since the iter_swap could change the value that value would refer to if it were made a reference, changing this is non-trivial. Really, it's probably not worth doing unless copying is expected to be expensive, but it's still interesting to think about.

Basically what you'd need to do to avoid a copy is to have value be an iterator instead of a value. Then when the iter_swap calls are made, you would need to check if either of the iterators being swapped is equal to value. If one were equal, you would need to do the swap, then make value be an iterator to the other one (so value still pointed at the correct value).

This could actually be wrapped in a fairly pleasant function if you wanted go this route:

template<typename Iter>
void iter_swap_maintain(Iter swap_a, Iter swap_b, Iter& iter) {
 std::iter_swap(swap_a, swap_b);
 if (iter == swap_a) {
 iter = swap_b;
 } else if (set == swap_b) {
 iter = swap_a;
 }
}

In the case of cheap copies, this would of course be (potentially very significantly) slower than just copying value and using iter_swap directly.

Question 7

Strangely, just doing const auto& value = ... makes the sort fail, but if I instead change the return type of medianOfThree to auto and have it return *low, *mid, or *high, it works.

Question 8

@Quincunx That will still create a copy, so there's not anything to gain there in terms of performance. I just realized that I was wrong in saying "you don't access it after its location has potentially changed." Unfortunately, if one of the iter_swaps changes the value at the location value points to, then the sort will indeed break. It would seem the copy is unavoidable. Or rather, unless you want to make value be an iterator instead of a value and update it appropriately if iter_swap is going to be applied to the item at which it points.

Corbin Corbin 10.6k2 gold badges30 silver badges51 bronze badges · Accepted Answer · 2015-04-04 02:43:27Z

It's been ages since I last understood quick sort, so unfortunately I don't have much to say on the implementation from an algorithmic point of view (though it does look good to me). I do have a few notes on style/technicalities though.

Style

Is value the pivot? If so name it pivot instead of value. Pretty much everything in a program is a 'value' so the name is meaningless.
size1 and size2 could use more meaningful names. In fact, with proper names, you won't need the comments for them.
Except for super obvious cases (i.e. for (int i = ...)), try to avoid one letter variable names.
Since you're doing ProperCase type names (at least I assume so from the namespace), the template params should be Iterator instead of iterator.
It's pretty obvious since it's quicksort, but I like to specify iterator type in template params like RAIterator. In cases where it's less obvious, it can be quicker to glance at than the docs.
I don't know if I would use the and operator. There's no harm in it, but it's highly unusual to actually see it.
medianOfThree could be rearranged so that it's a bit terser by nesting some of the second conditionals instead of having a line for each case. (You could probably also abuse std::min and std::max, but meh... I think that'd just make it more complicated.)

Technicalities

Use type deduction instead of int for the iterator differences. It's highly unlikely that you'll actually have a range larger than the capacity of an int, but it's always better to be on the safe side.

Support for #pragma once is pretty widespread now, but it's still not standard. It's completely acceptable to use it as long as you're aware of the trade offs, but deviation from the standard should never be taken lightly.

auto value = *medianOfThree(begin, end - 1);

Depending on the type of value, this could be incredibly expensive. Unfortunately, since the iter_swap could change the value that value would refer to if it were made a reference, changing this is non-trivial. Really, it's probably not worth doing unless copying is expected to be expensive, but it's still interesting to think about.

Basically what you'd need to do to avoid a copy is to have value be an iterator instead of a value. Then when the iter_swap calls are made, you would need to check if either of the iterators being swapped is equal to value. If one were equal, you would need to do the swap, then make value be an iterator to the other one (so value still pointed at the correct value).

This could actually be wrapped in a fairly pleasant function if you wanted go this route:

template<typename Iter>
void iter_swap_maintain(Iter swap_a, Iter swap_b, Iter& iter) {
 std::iter_swap(swap_a, swap_b);
 if (iter == swap_a) {
 iter = swap_b;
 } else if (set == swap_b) {
 iter = swap_a;
 }
}

In the case of cheap copies, this would of course be (potentially very significantly) slower than just copying value and using iter_swap directly.

Strangely, just doing const auto& value = ... makes the sort fail, but if I instead change the return type of medianOfThree to auto and have it return *low, *mid, or *high, it works.
@Quincunx That will still create a copy, so there's not anything to gain there in terms of performance. I just realized that I was wrong in saying "you don't access it after its location has potentially changed." Unfortunately, if one of the iter_swaps changes the value at the location value points to, then the sort will indeed break. It would seem the copy is unavoidable. Or rather, unless you want to make value be an iterator instead of a value and update it appropriately if iter_swap is going to be applied to the item at which it points.

Stack Exchange Network

Is this implementation of Quicksort good?

1 Answer 1

Style

Technicalities

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Is this implementation of Quicksort good?

1 Answer 1

Style

Technicalities

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions