Binary search optimization: Kernighan & Ritchie 3-1

Question 1

The problem I am trying to solve can be described as take binarysearch0 and rewrite it as binarysearch1 such that binarysearch1 uses a single test inside the loop instead of two. The code I used to do this is seen below. Please let me know if you think I accomplished the task, and what I can do better.

#include <stdlib.h>
#include <stdio.h>
void printdata(int *v, int n)
{
 int i = 0;
 for (i = 0; i < n; ++i) {
 printf("%3d", v[i]);
 }
 putchar('\n');
}
int binarysearch0(int x, int *v, int n)
{
 int low, high, mid;
 low = 0;
 high = n - 1;
 while (low <= high) {
 mid = (low + high) / 2;
 if ( x < v[mid])
 high = mid -1;
 else if (x > v[mid])
 low = mid + 1;
 else
 return mid;
 }
 return -1;
}
int binarysearch1(int x, int *v, int n)
{
 int low, high, mid;
 low = 0;
 high = n - 1;
 while (low <= high && v[(mid = (low + high) / 2)] != x ) {
 mid = (low + high) / 2;
 if ( x < v[mid])
 high = mid -1;
 else
 low = mid + 1;
 }
 return (v[mid] == x) ? mid : -1;
}
int main(int argc, char *argv[])
{
 int v[10][10] = { { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9},
 {10,11,12,13,14,15,16,17,18,19},
 {20,21,22,23,24,25,26,27,28,29},
 {30,31,32,33,34,35,36,37,38,39},
 {40,41,42,43,44,45,46,47,48,49},
 {50,51,52,53,54,55,56,57,58,59},
 {60,61,62,63,64,65,66,67,68,69},
 {70,71,72,73,74,75,76,77,78,79},
 {80,81,82,83,84,85,86,87,88,89},
 {90,91,92,93,94,95,96,97,97,99}
 };
 int i = 0;
 for (i = 0; i < 10; ++i) {
 int search = 0;
 (i == 0) ? (search = 1000) : (search = v[i][0] + rand()%10);
 if(binarysearch1(search, v[i], 10) > -1) {
 printf("%d was found in the data: %14s", search, " ");
 printdata(v[i], 10);
 } else {
 printf("%d was NOT found in the data: %10s", search, " ");
 printdata(v[i], 10);
 }
 putchar('\n');
 }
 return 0;
}

Question 2

I find binarysearch0 clearer than binarysearch1, and here's no difference efficiency-wise. If this is a homework question, how exactly was it formulated? If not, what do you hope to gain with binarysearch1?

Question 3

This is not homework, but it comes from the C Programming Language book by Kernighan and Ritchie. He asks you to reduce the number of conditionals in the loop and compare run-times.

Question 4

Then what about if (v[mid] < x){ low = mid+1; } else { high = mid; }?

Question 5

You're not really changing the number of comparisons inside the loop here: all you've done is move it from the loop body to the loop termination condition. Note that in many situations, comparisons on the array elements (which could be strings or more complex objects) are a lot costlier than comparisons on the indices.

You can find several solutions to this exercise on the clc-wiki. I think one of them (Paul Griffith's) misses the point (it's pretty much the same as yours); I'm going to show explain the reasoning behind Andrew Tesker and Colin Barker's solutions.

In the original function binsearch, inside the loop, there are three possible situations: x<v[mid], x==v[mid] and x>v[mid]. Sometimes, a comparison function has three outcomes: less than, equal or greater than; in such cases, the approach in the original function is best. For example, if we were comparing strings with strcmp:

 int cmp = strcmp(x, v[mid]);
 if (cmp < 0) high = mid - 1;
 else if (cmp > 0) low = mid + 1;
 else return mid;

In other cases, the only primitive you have is a less-than or less-or-equal-to, and to test for equality requires a second comparison. Thus the method presented in K&R requires two calls to the comparison function per iteration through the loop. It is more efficient to require a single comparison. Now we absolutely need to distinguish between the less-than and the greater-than case, so this means we're not going to be able to deal with the equality case inside the loop. So we'll keep going either up or down in the equality case. Since we haven't vetted the value v[mid] in one of the cases, we need to keep it inside the search space.

 if (x < v[mid]) high = mid - 1;
 else low = mid;

What happens to the loop termination? Now, we can get stuck with low == high and x >= v[low]. However, this is easily resolved by ending the loop when low == high.

while (low < high) {
 mid = (low + high + 1) / 2;
 if (x < v[mid]) high = mid - 1;
 else low = mid;
}

Note that I set mid to low + high + 1. In the case where low + 1 == high, (low + high) / 2 is low, and if x is still too small we'd set mid to low and loop forever (thanks to Elyasin for noticing that my original answer had this bug). Instead in this case we arrange for mid to be set to high. If low <= high we exit the loop anyway, and in other cases we'll continue looping and this change only causes the middle to be rounded up instead of down.

At the exit of the loop, if there is a matching element in the array, then v[low] is equal to it. So we perform a final equality test:

return x == v[low] ? low : -1;

The number of element comparisons performed by the original version is anywhere from 1 to 2*ceil(log2(n)+1) where n is the number of elements in the array. The number of element comparisons in the version with a single comparison within the loop is always ceil(log2(n))+2. That's a smaller maximum, but not necessarily a gain, because the original version will terminate earlier if it finds the element. The original version is preferable when there is a good chance that the element will be in the array. The modified version is preferable when the element is rarely present.

Question 6

Nice answer. I think there is a bug. I would use the assignment mid = (low + high) / 2 + 1;.

Question 7

@Elyasin Why? What's the bug?

Question 8

I think it loops infinitely, the reason being that the division by integers is truncated. Try to find the last element in a list.

Question 9

@Elyasin Got it. But (low+high)/2+1 is wrong too: it can cause an access one past the end of the array when low==high.

Question 10

P.S. Beware of bugs in the above code; I have only tried it, not proved it correct.

Question 11

In binarysearch1, you've replaced one test in the loop body with one test in the loop condition. That doesn't reduce the number of tests, and I agree with Gilles, binarysearch0 is clearer. I would recommend keeping that. However, in both searches, the calculation of mid = (low + high) / 2 can overflow if the array you're searching is large. Use an overflow-safe calculation instead. The readable one is mid = low + (high - low) / 2 (assuming you can guarantee that low and high are never negative, INT_MAX - INT_MIN for example would overflow).

Question 12

I would not waste any time at all worrying about factoring away one of the two comparisons, because a good compiler should be able to optimize them into one anyway. (Maybe during Kernighan and Ritchie's time compilers were not smart enough to pull such tricks, so maybe this question was meaningful back then, but believe me, they have gone a very long way since then!)

The compiler knows that x and v[mid] (or the result of calling the comparison function) are not changed by the if statement, so it can emit the code which compares these two operands only once. The results of the comparison will be stored in the Carry (C) and Zero (Z) bits of the Flags register of an x86 CPU, as follows: If Z is set, it means that the two operands of the comparison were equal; otherwise, if C is set, then the second operand was greater than the first; otherwise, (both flags are clear,) the second operand was smaller than the first. So, if the compiler is smart at all, after the comparison instruction it will emit two consecutive conditional jump instructions: one which will branch to the low = mid + 1 part if the result of the comparison was 'greater than', (Z=0, C=0,) immediately followed by another which will branch to the high = mid - 1 part if the result of the comparison was 'less than'. (Z=0, C=1.) A jump instruction never modifies the Flags register of the CPU, so after the first jump instruction has examined the flags, and decided not to branch, the flags will still be intact for the second jump instruction to also examine them. And if the second jump instruction decides not to branch either, then the CPU will fall through to the code which will handle the last remaining case, where the operands were equal.

I would be willing to bet money that a decent C compiler will do this for you. Even if it turns out that it does not, the whole topic can nonetheless be dismissed as belonging to the general class of problems that compilers should be taking care of for us, so that we can spend our time thinking about more useful things.

score 8 · Accepted Answer · 2011-11-19 22:55:57Z

You're not really changing the number of comparisons inside the loop here: all you've done is move it from the loop body to the loop termination condition. Note that in many situations, comparisons on the array elements (which could be strings or more complex objects) are a lot costlier than comparisons on the indices.

You can find several solutions to this exercise on the clc-wiki. I think one of them (Paul Griffith's) misses the point (it's pretty much the same as yours); I'm going to show explain the reasoning behind Andrew Tesker and Colin Barker's solutions.

In the original function binsearch, inside the loop, there are three possible situations: x<v[mid], x==v[mid] and x>v[mid]. Sometimes, a comparison function has three outcomes: less than, equal or greater than; in such cases, the approach in the original function is best. For example, if we were comparing strings with strcmp:

 int cmp = strcmp(x, v[mid]);
 if (cmp < 0) high = mid - 1;
 else if (cmp > 0) low = mid + 1;
 else return mid;

In other cases, the only primitive you have is a less-than or less-or-equal-to, and to test for equality requires a second comparison. Thus the method presented in K&R requires two calls to the comparison function per iteration through the loop. It is more efficient to require a single comparison. Now we absolutely need to distinguish between the less-than and the greater-than case, so this means we're not going to be able to deal with the equality case inside the loop. So we'll keep going either up or down in the equality case. Since we haven't vetted the value v[mid] in one of the cases, we need to keep it inside the search space.

 if (x < v[mid]) high = mid - 1;
 else low = mid;

What happens to the loop termination? Now, we can get stuck with low == high and x >= v[low]. However, this is easily resolved by ending the loop when low == high.

while (low < high) {
 mid = (low + high + 1) / 2;
 if (x < v[mid]) high = mid - 1;
 else low = mid;
}

Note that I set mid to low + high + 1. In the case where low + 1 == high, (low + high) / 2 is low, and if x is still too small we'd set mid to low and loop forever (thanks to Elyasin for noticing that my original answer had this bug). Instead in this case we arrange for mid to be set to high. If low <= high we exit the loop anyway, and in other cases we'll continue looping and this change only causes the middle to be rounded up instead of down.

At the exit of the loop, if there is a matching element in the array, then v[low] is equal to it. So we perform a final equality test:

return x == v[low] ? low : -1;

The number of element comparisons performed by the original version is anywhere from 1 to 2*ceil(log2(n)+1) where n is the number of elements in the array. The number of element comparisons in the version with a single comparison within the loop is always ceil(log2(n))+2. That's a smaller maximum, but not necessarily a gain, because the original version will terminate earlier if it finds the element. The original version is preferable when there is a good chance that the element will be in the array. The modified version is preferable when the element is rarely present.

Nice answer. I think there is a bug. I would use the assignment mid = (low + high) / 2 + 1;.
I think it loops infinitely, the reason being that the division by integers is truncated. Try to find the last element in a list.
@Elyasin Got it. But (low+high)/2+1 is wrong too: it can cause an access one past the end of the array when low==high.
P.S. Beware of bugs in the above code; I have only tried it, not proved it correct.

Stack Exchange Network

Binary search optimization: Kernighan & Ritchie 3-1

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Binary search optimization: Kernighan & Ritchie 3-1

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions