I have recently come across a promising Kth selection routine that reportedly outperforms quickselect the Floyd, Rivest Select routine. This Wikipedia article provides a pseudocode version which I tried to translate to C.
The link to the actual paper with the code in ALGOL.
This is my translation of the code
void select(float array[], int left, int right, int k)
{
#define sign(x) ((x > 0.0) ? 1 : ((x < 0.0) ? (-1) : 0))
#define F_SWAP(a,b) { float temp=(a);(a)=(b);(b)=temp; }
int n, i, j, ll, rr;
int s, sd;
float z, t;
while (right > left)
{
// use select recursively to sample a smaller set of size s
// the arbitrary constants 600 and 0.5 are used in the original
// version to minimize execution time
if (right - left > 600)
{
n = right - left + 1;
i = k - left + 1;
z = log(n);
s = 0.5 * exp(2 * z/3);
sd = 0.5 * sqrt(z * s * (n - s)/n) * sign(i - n/2);
ll = max(left, k - i * s/n + sd);
rr = min(right, k + (n - i) * s/n + sd);
select(array, ll, rr, k);
}
// partition the elements between left and right around t
t = array[k];
i = left;
j = right;
F_SWAP(array[left],array[k]);
if (array[right] > t)
{
F_SWAP(array[right],array[left]);//swap array[right] and array[left]
}
while (i < j)
{
F_SWAP(array[i],array[j]);
i = i + 1;
j = j -1;
while (array[i] < t)
{
i = i + 1;
}
while (array[j] > t)
{
j = j - 1;
}
}
if (array[left] == t)
{
F_SWAP (array[left], array[j])
}
else
{
j = j + 1;
F_SWAP (array[j],array[right])
}
// adjust left and right towards the boundaries of the subset
// containing the (k - left + 1)th smallest element
if (j <= k)
{
left = j + 1;
}
if (k <= j)
{
right = j - 1;
}
}
#undef sign
#undef F_SWAP
}
It seems to work ok, however, I have many questions about this routine which puzzle me.
First, according to what I have read, this routine is supposed to be faster than Hoare's QuickSelect and Wirth's Selection algorithms. However, in my code it seems to be slower.
What exactly are the magic numbers 600 and .5 doing here? The pseudocode explains that they are arbitrary constants to minimize execution time. Was that specific to the ALGOL language?
What exactly are the log, exponential, square root and Sigma performing to actually find the Kth position?
I think that the performance problem I am experiencing with my translation of this routine is probably due to exponent and logarithm.
-
I'm voting to close this question as off-topic because it is asking for an explanation of what particular parts of the code do.RubberDuck– RubberDuck2015年05月23日 16:48:31 +00:00Commented May 23, 2015 at 16:48
-
Maybe the non recursive implementation is supposed to be faster? Also, i can't open the paper on my phone unfortunately, but if its old, the fact that it isn't faster for you could be due to how hardware has changed over the years. For instance branching is less of an issue than it used to be, and memory access is a lot slower compared to CPU instruction speeds than it used to be , especially when its not accessed in cache friendly patterns. Also dumb question but did you run in release?Alan Wolfe– Alan Wolfe2015年05月23日 22:10:04 +00:00Commented May 23, 2015 at 22:10
-
I got a similar comparison of results when I ran the routine in release. I am running it against quick select, quick select with Median of 3, Wirth Median, Wirth with Median of 3 and the Torbin Median. In release or debug, I get a comparable time frame. Perhaps modern hardware handles quick select better as you said.Andy Dansby– Andy Dansby2015年05月24日 01:18:14 +00:00Commented May 24, 2015 at 1:18
1 Answer 1
After fidgeting around with the code a bit, I've got some better results. I went back to the original paper and ignored the wikipedia page. I've compared the algorithm to other quick select routines with some great results.
Ok, here are the methods I have been playing with. Note these are for floating point and also that I changed my method from a void.
Quick Select ver 1
float quickselect(float *arr, int length, int kTHvalue)
{
#define F_SWAP(a, b) { tmp = arr[a]; arr[a] = arr[b]; arr[b] = tmp; }
int i, st;
float tmp;
for (st = i = 0; i < length - 1; i++)
{
if (arr[i] > arr[length-1]) continue;
F_SWAP(i, st);
st++;
}
F_SWAP(length-1, st);
#undef F_SWAP
return kTHvalue == st ?arr[st]
:st > kTHvalue ? quickselect(arr, st, kTHvalue)
: quickselect(arr + st, length - st, kTHvalue - st);
}
I think the original source is Rosetta code. It works rather quickly, but after looking at some other codes, another quick select function was written.
Quick Select ver 2
float quickselect2(float *arr, int length, int kthLargest)
{
#define F_SWAP(a, b) { tmp = a; a = b; b = tmp; }
int left = 0;
int right = length - 1;
int pos;
float tmp;
for (int j = left; j < right; j++)
{
float pivot = arr[kthLargest];
F_SWAP(arr[kthLargest], arr[right]);
for (int i = pos = left; i < right; i++)
{
if (arr[i] < pivot)
{
F_SWAP(arr[i], arr[pos]);
pos++;
}
}
F_SWAP(arr[right], arr[pos]);
#undef F_SWAP
if (pos == kthLargest) break;
if (pos < kthLargest) left = pos + 1;
else right = pos - 1;
}
return arr[kthLargest];
}
I looked around and found yet another quick select that had some good promise. It was a quick select with a Median of 3
float quickselect_MO3(float *arr, const int length, const int kTHvalue)
{
#define F_SWAP(a,b) { float temp=(a);(a)=(b);(b)=temp; }
unsigned int low = 0;
unsigned int high = length - 1;
for (unsigned int j = low; j < high; j++)
{
if (high <= low) // One element only
return arr[kTHvalue];
if (high == low + 1)
{ // Two elements only
if (arr[low] > arr[high])
F_SWAP(arr[low], arr[high]);
return arr[kTHvalue];
}
//median of 3
int middle = (low + high) / 2;
if (arr[middle] > arr[high])
F_SWAP(arr[middle], arr[high]);
if (arr[low] > arr[high])
F_SWAP(arr[low], arr[high]);
if (arr[middle] > arr[low])
F_SWAP(arr[middle], arr[low]);
// Swap low item (now in position middle) into position (low+1)
F_SWAP(arr[middle], arr[low+1]);
// Nibble from each end towards middle, swapping items when stuck
unsigned int ll = low + 1;
unsigned int hh = high - 1;//unsigned int hh = high;
for (unsigned int k = ll; k < hh; k++)
{
do ll++; while (arr[low] > arr[ll]);
do hh--; while (arr[hh] > arr[low]);
if (hh < ll)
break;
F_SWAP(arr[ll], arr[hh]);
}
// Swap middle item (in position low) back into correct position
F_SWAP(arr[low], arr[hh]);
// Re-set active partition
if (hh <= kTHvalue)
low = ll;
if (hh >= kTHvalue)
high = hh - 1;
}
#undef F_SWAP
}
The Median of 3 is fast and rather impressive.
Next, with my rewritten (from the original ALGOL converted in the paper) Floyd Routine.
the Floyd median function ver 1
float select(float array[], int left, int right, int k)
{
#define sign(x) ((x > 0.0) ? 1 : ((x < 0.0) ? (-1) : 0))
#define F_SWAP(a,b) { float temp=(a);(a)=(b);(b)=temp; }
int i;
right = right - 1;
while (right > left)
{
// use select recursively to sample a smaller set of size s
// the arbitrary constants 600 and 0.5 are used in the original
// version to minimize execution time
if (right - left > right)
{
int n = right - left + 1;
i = k - left + 1;
float z = logf(n);
float s = 0.5 * expf(2 * z/3);
float sd = 0.5 * sqrtf(z * s * (n - s) / n) * sign(i - n / 2);
int ll = max(left, k - 1 * s/n + sd);
int rr = min(right, k + (n - 1) * s/n + sd);
select(array, ll, rr, k);
}
// partition the elements between left and right around t
float t = array[k];
i = left;
int j = right;
F_SWAP(array[left],array[k]);
if (array[right] > t)
{
F_SWAP(array[right],array[left]);
}
while (i < j)
{
F_SWAP(array[i],array[j]);
i++;
j--;
while (array[i] < t)
{
i++;
}
while (array[j] > t)
{
j--;
}
}
if (array[left] == t)
{
F_SWAP (array[left], array[j])
}
else
{
j++;
F_SWAP (array[j],array[right])
}
// adjust left and right towards the boundaries of the subset
// containing the (k - left + 1)th smallest element
if (j <= k)
{
left = j + 1;
}
if (k <= j)
{
right = j - 1;
}
}
return array[k];
#undef sign
#undef F_SWAP
}
Finally, I decided to remove the Log function, Exponent and square root functions and got the same output and even a better time.
the Floyd median function ver 2
float select1(float array[], int left, int right, int k)
{
#define sign(x) ((x > 0.0) ? 1 : ((x < 0.0) ? (-1) : 0))
#define F_SWAP(a,b) { float temp=(a);(a)=(b);(b)=temp; }
int i;
right = right - 1;
while (right > left)
{
// use select recursively to sample a smaller set of size s
// the arbitrary constants 600 and 0.5 are used in the original
// version to minimize execution time
if (right - left > right)
{
int n = right - left + 1;
i = k - left + 1;
int s = (2 * n / 3);
int sd = (n * s * (n - s) / n) * sign(i - n / 2);
int ll = max(left, k - i * s / n + sd);
int rr = min(right, k + (n - i) * s / n + sd);
select1(array, ll, rr, k);
}
// partition the elements between left and right around t
float t = array[k];
i = left;
int j = right;
F_SWAP(array[left],array[k]);
if (array[right] > t)
{
F_SWAP(array[right],array[left]);
}
while (i < j)
{
F_SWAP(array[i],array[j]);
i++;
j--;
while (array[i] < t)
{
i++;
}
while (array[j] > t)
{
j--;
}
}
if (array[left] == t)
{
F_SWAP (array[left], array[j])
}
else
{
j++;
F_SWAP (array[j],array[right])
}
// adjust left and right towards the boundaries of the subset
// containing the (k - left + 1)th smallest element
if (j <= k)
{
left = j + 1;
}
if (k <= j)
{
right = j - 1;
}
}
return array[k];
#undef sign
#undef F_SWAP
}
Here is a chart of the results
enter image description here
EDIT I coded the Floyd Routine with the median of 3 and updated the chart. It rates about the same speed as quick select with Median of 3, slightly slower. It deserves a little more exploration.
float select3(float array[], int left, int right, int k)
{
#define sign(x) ((x > 0.0) ? 1 : ((x < 0.0) ? (-1) : 0))
#define F_SWAP(a,b) { float temp=(a);(a)=(b);(b)=temp; }
int i;
right = right - 1;
while (right > left)
{
if( array[k] < array[left] ) F_SWAP(array[left],array[k]);
if( array[right] < array[left] ) F_SWAP(array[left],array[right]);
if( array[right] < array[k] ) F_SWAP(array[k],array[right]);
if (right - left > right)
{
int n = right - left + 1;
i = k - left + 1;
int s = (2 * n / 3);
int sd = (n * s * (n - s) / n) * sign(i - n / 2);
int ll = max(left, k - i * s / n + sd);
int rr = min(right, k + (n - i) * s / n + sd);
select1(array, ll, rr, k);
}
// partition the elements between left and right around t
float t = array[k];
i = left;
int j = right;
F_SWAP(array[left],array[k]);
if (array[right] > t)
{
F_SWAP(array[right],array[left]);
}
while (i < j)
{
F_SWAP(array[i],array[j]);
i++;
j--;
while (array[i] < t)
{
i++;
}
while (array[j] > t)
{
j--;
}
}
if (array[left] == t)
{
F_SWAP (array[left], array[j])
}
else
{
j++;
F_SWAP (array[j],array[right])
}
// adjust left and right towards the boundaries of the subset
// containing the (k - left + 1)th smallest element
if (j <= k)
{
left = j + 1;
}
if (k <= j)
{
right = j - 1;
}
}
return array[k];
#undef sign
#undef F_SWAP
}
As you can see, the Floyd routine is faster than any of the quick selects other than the version with the median of 3. I don't see why the median of 3 could not be added to the Floyd routine and I will look at that next. As for the Floyd routine, of course removing the floating point log, exp and sqrt functions speeds up the routine.
I still haven't figured out why Floyd put it there in the first place.
edit 6-7-15 Here is the non-recursive version, runs at the same speed as the recursive version.
float select7MO3(float array[], const int length, const int kTHvalue)
{
#define sign(x) ((x > 0) ? 1 : ((x < 0) ? (-1) : 0))
#define F_SWAP(a,b) { float temp=(a);(a)=(b);(b)=temp; }
int left = 0;
int i;
int right = length - 1;
int rr = right;
int ll = left;
while (right > left)
{
if( array[kTHvalue] < array[left] ) F_SWAP(array[left],array[kTHvalue]);
if( array[right] < array[left] ) F_SWAP(array[left],array[right]);
if( array[right] < array[kTHvalue] ) F_SWAP(array[kTHvalue],array[right]);
if ((right - left) > kTHvalue)
{
int n = right - left + 1;
i = kTHvalue - left + 1;
int s = (2 * n / 3);
int sd = (n * s * (n - s) / n) * sign(i - n);
ll = max(left, kTHvalue - i * s / n + sd);
rr = min(right, kTHvalue + (n - i) * s / n + sd);
}
// partition the elements between left and right around t
float t = array[kTHvalue];
i = left;
int j = right;
F_SWAP(array[left],array[kTHvalue]);
if (array[right] > t)
{
F_SWAP(array[right],array[left]);
}
while (i < j)
{
F_SWAP(array[i],array[j]);
i++;
j--;
while (array[i] < t)
{
i++;
}
while (array[j] > t)
{
j--;
}
}
if (array[left] == t)
{
i--;
F_SWAP (array[left], array[j])
}
else
{
j++;
F_SWAP (array[j],array[right])
}
// adjust left and right towards the boundaries of the subset
// containing the (k - left + 1)th smallest element
if (j <= kTHvalue)
{
left = j + 1;
}
else if (kTHvalue <= j)
{
right = j - 1;
}
}
return array[kTHvalue];
#undef sign
#undef F_SWAP
}
edit 6-13-15
I experimented some more with the Floyd selection routine and started to compare his routine against N. Wirth selection routine. An idea came across, what would happen if I combined his routine with Floyd's routine. This is what I came up with.
float FloydWirth_kth(float arr[], const int length, const int kTHvalue)
{
#define F_SWAP(a,b) { float temp=(a);(a)=(b);(b)=temp; }
#define SIGNUM(x) ((x) < 0 ? -1 : ((x) > 0 ? 1 : (x)))
int left = 0;
int right = length - 1;
int left2 = 0;
int right2 = length - 1;
//while (left < right)
while (left < right)
{
if( arr[right2] < arr[left2] ) F_SWAP(arr[left2],arr[right2]);
if( arr[right2] < arr[kTHvalue] ) F_SWAP(arr[kTHvalue],arr[right2]);
if( arr[kTHvalue] < arr[left2] ) F_SWAP(arr[left2],arr[kTHvalue]);
int rightleft = right - left;
if (rightleft < kTHvalue)
{
int n = right - left + 1;
int ii = kTHvalue - left + 1;
int s = (n + n) / 3;
int sd = (n * s * (n - s) / n) * SIGNUM(ii - n / 2);
int left2 = max(left, kTHvalue - ii * s / n + sd);
int right2 = min(right, kTHvalue + (n - ii) * s / n + sd);
}
float x=arr[kTHvalue];
while ((right2 > kTHvalue) && (left2 < kTHvalue))
{
do
{
left2++;
}while (arr[left2] < x);
do
{
right2--;
}while (arr[right2] > x);
//F_SWAP(arr[left2],arr[right2]);
F_SWAP(arr[left2],arr[right2]);
}
left2++;
right2--;
if (right2 < kTHvalue)
{
while (arr[left2]<x)
{
left2++;
}
left = left2;
right2 = right;
}
if (kTHvalue < left2)
{
while (x < arr[right2])
{
right2--;
}
right = right2;
left2 = left;
}
if( arr[left] < arr[right] ) F_SWAP(arr[right],arr[left]);
}
#undef F_SWAP
#undef SIGNUM
return arr[kTHvalue];
}
The speed difference is amazing (well at least to me). Here is a chart showing the speeds of the various routines.
Floyd Selection routine 489 speeds
-
Good analysis and investigation!Alan Wolfe– Alan Wolfe2015年05月25日 03:16:21 +00:00Commented May 25, 2015 at 3:16
-
In your Floyd-Wirth hybrid, it looks like the
rightleft < kTHvalue
branch is optimized away. Was this intentional?Azmisov– Azmisov2016年11月29日 23:49:05 +00:00Commented Nov 29, 2016 at 23:49 -
I've been digging into selection algorithms more, and I'm amazed how fast your partitioning scheme is (on the Floyd+Wirth). I can't find this method listed in either Wirth's code or the original Floyd-Rivest paper. Did you come up with it yourself?Azmisov– Azmisov2017年01月27日 02:27:11 +00:00Commented Jan 27, 2017 at 2:27
-
This was one that I came up by myself for a median filter I was working on for image processing. I wanted to handle any bit depth including floating point. Just pure experimentation by tweaking and combining code.Andy Dansby– Andy Dansby2017年01月29日 10:13:30 +00:00Commented Jan 29, 2017 at 10:13
-
Given the input arr[10, 7, 9, 7, 2, 8, 8, 1, 9, 4], kTHvalue=5 When I test "select7MO3" I get 8 as expected, but "FloydWirth_kth" returns 9.Carlos Hoyos– Carlos Hoyos2017年03月14日 12:16:12 +00:00Commented Mar 14, 2017 at 12:16
Explore related questions
See similar questions with these tags.