What do you think of this piece of code ? This is an implementation of quick sort in C, but I'm not sure about the quality and correctness.
#include <stdio.h>
void swap(int tab[], int a, int b)
{
int temp = tab[a];
tab[a] = tab[b];
tab[b] = temp;
}
void quickSort(int tab[], int begin, int end)
{
int left = begin-1;
int right = end+1;
const int pivot = tab[begin];
if(begin >= end)
return;
while(1)
{
do right--; while(tab[right] > pivot);
do left++; while(tab[left] < pivot);
if(left < right)
swap(tab, left, right);
else break;
}
quickSort(tab, begin, right);
quickSort(tab, right+1, end);
}
int main(void)
{
int tab[5] = {5, 3, 4, 1, 2};
int i;
quickSort(tab, 0, 4);
for(i = 0; i < 5; i++)
{
printf("%d ", tab[i]);
}
putchar('\n');
return 0;
}
Is it the standard way of implementing a quick sort in C ?
2 Answers 2
The code is not a bad implementation of Quicksort but it has a flaw, inherent to the Quicksort algorithm, that you can easily fix. I tested the code with this main:
const int testsize = 100000;
int main(void)
{
int tab[testsize];
int i;
for(i = 1; i < testsize; i++)
tab[i] = i;
tab[0] = testsize+1;
quickSort(tab, 0, testsize-1);
return 0;
}
As you can see, this is deliberately the worst case scenario for Quicksort in which it takes \$O(n^2)\$ time to sort. This code took 11.85 seconds on my machine. Quicksort works best on randomly arranged data, so paradoxically, you can often actually improve sorting time by randomly shuffling the data before you start. I wrote this rather simple-minded shuffle routine:
void shuffle(int tab[], int begin, int end)
{
for (; begin < end/2; ++begin)
swap(tab, begin, rand()%end);
}
When I inserted a call to shuffle()
just before the call to quickSort()
in main
, the time to sort the same data dropped to 0.026 seconds. By comparison, the standard qsort()
sorts in 0.006 seconds on this same machine.
Also, more generally, I'd be inclined to pass two pointers to swap
rather than an array and two indices. The code may be very slightly faster, but it's certainly more general.
The code could also be made more general by using the same parameters as the qsort()
routine in <stdlib.h>
but then, that's already written. :)
-
\$\begingroup\$ Isn't rand()%end a bad practice (modulo bias..) ? I know it's irrelevant in this case. \$\endgroup\$user3585425– user35854252014年04月29日 14:40:35 +00:00Commented Apr 29, 2014 at 14:40
-
1\$\begingroup\$ @user3585425: yes, it is, but as you say, it really doesn't matter in this case. Even if that were fixed, it's still a simple-minded shuffle. :) \$\endgroup\$Edward– Edward2014年04月29日 14:42:27 +00:00Commented Apr 29, 2014 at 14:42
-
2\$\begingroup\$ I don't think shuffling is a solution. The probability of getting the worst case before shuffling is exactly equal to the probability to end up with the worst case after it. A little better remedy is using more sophisticated pivot selection strategy (such as a median of 3 or 5 elements) \$\endgroup\$vnp– vnp2014年04月29日 17:13:54 +00:00Commented Apr 29, 2014 at 17:13
-
1\$\begingroup\$ @vnp: in theory, your assertion about probabilities may be valid, but in practice it definitely is not. In the real world, nearly ordered lists are actually quite common. Also, rather than tinkering with modifications to Quicksort, one could simply use a heap sort. Which is "better" depends mostly on the requirements of the problem at hand. \$\endgroup\$Edward– Edward2014年04月29日 18:02:44 +00:00Commented Apr 29, 2014 at 18:02
-
\$\begingroup\$ @Edward: 100% agree with our definition of better. Most politely disagree with all the rest. \$\endgroup\$vnp– vnp2014年04月29日 18:25:54 +00:00Commented Apr 29, 2014 at 18:25
I'd recommend to use pointers instead of indices. It is one less parameter to pass down the recursion.
Algorithms on ranges are somewhat simpler if they are given semi-open ranges (that is,
begin
is the first element, andend
is the first element beyond the range).No raw loops. Factor out the
while
loop into a function on its own. Now not only the code becomes cleaner, but you also get a very importantpartition
algorithm for free. You may also want to make one step deeper, and recognize thatdo... while
loops are also important algorithms (namely,find_forward
and find_backward`).Once the
partition
is factored out, you may want to eliminate one of the recursions. The actual recursive call shall go into the smaller partition.Terminating a recursion early is a serious optimization, but it is beyond the scope of this review.