7
\$\begingroup\$

For fun, I decided to implement the quickest Quicksort I could. This is what I came up with. How can I make it better?

I know that it's slower for sorted and already reversed lists (through testing), but I'm not certain why. What I do know is that this algorithm reverses the list past the pivot, if the list was sorted/reversed to begin with, but I'm not sure why that would slow it down as much as it does.

#pragma once
#include "InsertionSort.h" // assume this has a good implementation of Insertion Sort
namespace QuickSort
{
 template<typename iterator>
 iterator medianOfThree(iterator low, iterator high) {
 iterator mid = low + (high - low) / 2; // get an iterator to the middle index
 if (*low <= *mid and *mid <= *high) return mid;
 if (*high <= *mid and *mid <= *low) return mid;
 if (*mid <= *low and *low <= *high) return low;
 if (*high <= *low and *low <= *mid) return low;
 return high;
 }
 /*
 * Sorts the list given by the two iterators (begin and end) by virtue of
 * Quicksort.
 */
 template<typename iterator>
 void sort(iterator begin, iterator end) {
 // the loop handles one of the partitions, guaranteeing "tail-call" optimization for the larger
 // partition. The smaller partition goes into the recursive step
 while (end - begin > 1) {
 if (end - begin < 20) { // this is supposed to make it faster for small lists
 InsertionSort::sort(begin, end);
 return;
 }
 auto value = *medianOfThree(begin, end - 1);
 iterator i = begin; // i marks the top of the "smaller" partition
 iterator j = begin; // j marks the bottom of the "larger" partition
 iterator n = end - 1; // n allows us to swap the larger values to the end
 // This partitioning algorithm allows us to partition into 3 groups:
 // smaller, equal, larger.
 // This makes it so that with large amounts of repeated values, we are faster.
 while (j <= n) {
 if (*j < value) {
 std::iter_swap(i, j);
 ++i;
 ++j;
 } else if (value < *j) {
 std::iter_swap(j, n);
 --n;
 } else {
 ++j;
 }
 }
 // at this point, i marks the end of the "smaller" partition
 // and j marks the beginning of the "larger" partition.
 // Note that between them resides values equal to the pivot.
 int size1 = i - begin; // this is the size of the "smaller" partition
 int size2 = end - j; // this is the size of the "larger" partition
 if (size1 < size2) {
 QuickSort::sort(begin, i); // recurse on the smaller partition
 begin = j; // this partition is handled by the next iteration of the loop
 } else {
 QuickSort::sort(j, end);
 end = i;
 }
 }
 }
}

When I tested this code, this is what I got:

Sorting a sorted vector of size 1000000 Seconds: 1.159602842
Sorting a sorted vector of size 1500000 Seconds: 1.160466839
Sorting a sorted vector of size 2000000 Seconds: 2.269401411
Sorting a reversed vector of size 1000000 Seconds: 0.551178608
Sorting a reversed vector of size 1500000 Seconds: 0.828606600
Sorting a reversed vector of size 2000000 Seconds: 1.263003406
Sorting a random vector of size 1000000 Seconds: 0.099294498
Sorting a random vector of size 1500000 Seconds: 0.151264950
Sorting a random vector of size 2000000 Seconds: 0.210116894
Sorting a binary random vector of size 1000000 Seconds: 0.006512821
Sorting a binary random vector of size 1500000 Seconds: 0.009594559
Sorting a binary random vector of size 2000000 Seconds: 0.013067604
Sorting a small random vector of size 1000000 Seconds: 0.090787581
Sorting a small random vector of size 1500000 Seconds: 0.137927258
Sorting a small random vector of size 2000000 Seconds: 0.405081084

Where sorted means consecutive integers from 0 to size - 1, reversed is the reverse of that, random means calls to rand() (I did not seed it), binary random means rand() % 2, and small random means rand() % (size / 10).

asked Apr 4, 2015 at 1:37
\$\endgroup\$
4
  • 2
    \$\begingroup\$ Amazingly, the binary random vector sorts extremely fast for huge sizes. At 100 million, it took about .6 of a second. \$\endgroup\$ Commented Apr 4, 2015 at 1:55
  • \$\begingroup\$ That speed is faster than the STL sort, which took 2.8 seconds. Strangely, at 400 million, it takes 3.2 seconds, while the STL sort crashes. \$\endgroup\$ Commented Apr 4, 2015 at 3:20
  • \$\begingroup\$ I do wonder a bit how you're timing things; maybe your small random performance is because you're timing how slow % is. \$\endgroup\$ Commented Apr 4, 2015 at 4:34
  • \$\begingroup\$ @Hurkyl I could post the code for my tester, but the way it works is that I generate each of the vectors to sort, then I copy them for each sort, and I take the time immediately before and after the sort, then sort using the STL sort and check that the sort worked right. \$\endgroup\$ Commented Apr 4, 2015 at 4:36

1 Answer 1

3
\$\begingroup\$

It's been ages since I last understood quick sort, so unfortunately I don't have much to say on the implementation from an algorithmic point of view (though it does look good to me). I do have a few notes on style/technicalities though.

Style

  • Is value the pivot? If so name it pivot instead of value. Pretty much everything in a program is a 'value' so the name is meaningless.
  • size1 and size2 could use more meaningful names. In fact, with proper names, you won't need the comments for them.
  • Except for super obvious cases (i.e. for (int i = ...)), try to avoid one letter variable names.
  • Since you're doing ProperCase type names (at least I assume so from the namespace), the template params should be Iterator instead of iterator.
  • It's pretty obvious since it's quicksort, but I like to specify iterator type in template params like RAIterator. In cases where it's less obvious, it can be quicker to glance at than the docs.
  • I don't know if I would use the and operator. There's no harm in it, but it's highly unusual to actually see it.
  • medianOfThree could be rearranged so that it's a bit terser by nesting some of the second conditionals instead of having a line for each case. (You could probably also abuse std::min and std::max, but meh... I think that'd just make it more complicated.)

Technicalities

Use type deduction instead of int for the iterator differences. It's highly unlikely that you'll actually have a range larger than the capacity of an int, but it's always better to be on the safe side.


Support for #pragma once is pretty widespread now, but it's still not standard. It's completely acceptable to use it as long as you're aware of the trade offs, but deviation from the standard should never be taken lightly.


auto value = *medianOfThree(begin, end - 1);

Depending on the type of value, this could be incredibly expensive. Unfortunately, since the iter_swap could change the value that value would refer to if it were made a reference, changing this is non-trivial. Really, it's probably not worth doing unless copying is expected to be expensive, but it's still interesting to think about.

Basically what you'd need to do to avoid a copy is to have value be an iterator instead of a value. Then when the iter_swap calls are made, you would need to check if either of the iterators being swapped is equal to value. If one were equal, you would need to do the swap, then make value be an iterator to the other one (so value still pointed at the correct value).

This could actually be wrapped in a fairly pleasant function if you wanted go this route:

template<typename Iter>
void iter_swap_maintain(Iter swap_a, Iter swap_b, Iter& iter) {
 std::iter_swap(swap_a, swap_b);
 if (iter == swap_a) {
 iter = swap_b;
 } else if (set == swap_b) {
 iter = swap_a;
 }
}

In the case of cheap copies, this would of course be (potentially very significantly) slower than just copying value and using iter_swap directly.

answered Apr 4, 2015 at 2:43
\$\endgroup\$
2
  • \$\begingroup\$ Strangely, just doing const auto& value = ... makes the sort fail, but if I instead change the return type of medianOfThree to auto and have it return *low, *mid, or *high, it works. \$\endgroup\$ Commented Apr 4, 2015 at 4:02
  • \$\begingroup\$ @Quincunx That will still create a copy, so there's not anything to gain there in terms of performance. I just realized that I was wrong in saying "you don't access it after its location has potentially changed." Unfortunately, if one of the iter_swaps changes the value at the location value points to, then the sort will indeed break. It would seem the copy is unavoidable. Or rather, unless you want to make value be an iterator instead of a value and update it appropriately if iter_swap is going to be applied to the item at which it points. \$\endgroup\$ Commented Apr 4, 2015 at 4:06

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.