The problem I am trying to solve can be described as take binarysearch0
and rewrite it as binarysearch1
such that binarysearch1
uses a single test inside the loop instead of two. The code I used to do this is seen below. Please let me know if you think I accomplished the task, and what I can do better.
#include <stdlib.h>
#include <stdio.h>
void printdata(int *v, int n)
{
int i = 0;
for (i = 0; i < n; ++i) {
printf("%3d", v[i]);
}
putchar('\n');
}
int binarysearch0(int x, int *v, int n)
{
int low, high, mid;
low = 0;
high = n - 1;
while (low <= high) {
mid = (low + high) / 2;
if ( x < v[mid])
high = mid -1;
else if (x > v[mid])
low = mid + 1;
else
return mid;
}
return -1;
}
int binarysearch1(int x, int *v, int n)
{
int low, high, mid;
low = 0;
high = n - 1;
while (low <= high && v[(mid = (low + high) / 2)] != x ) {
mid = (low + high) / 2;
if ( x < v[mid])
high = mid -1;
else
low = mid + 1;
}
return (v[mid] == x) ? mid : -1;
}
int main(int argc, char *argv[])
{
int v[10][10] = { { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9},
{10,11,12,13,14,15,16,17,18,19},
{20,21,22,23,24,25,26,27,28,29},
{30,31,32,33,34,35,36,37,38,39},
{40,41,42,43,44,45,46,47,48,49},
{50,51,52,53,54,55,56,57,58,59},
{60,61,62,63,64,65,66,67,68,69},
{70,71,72,73,74,75,76,77,78,79},
{80,81,82,83,84,85,86,87,88,89},
{90,91,92,93,94,95,96,97,97,99}
};
int i = 0;
for (i = 0; i < 10; ++i) {
int search = 0;
(i == 0) ? (search = 1000) : (search = v[i][0] + rand()%10);
if(binarysearch1(search, v[i], 10) > -1) {
printf("%d was found in the data: %14s", search, " ");
printdata(v[i], 10);
} else {
printf("%d was NOT found in the data: %10s", search, " ");
printdata(v[i], 10);
}
putchar('\n');
}
return 0;
}
3 Answers 3
You're not really changing the number of comparisons inside the loop here: all you've done is move it from the loop body to the loop termination condition. Note that in many situations, comparisons on the array elements (which could be strings or more complex objects) are a lot costlier than comparisons on the indices.
You can find several solutions to this exercise on the clc-wiki. I think one of them (Paul Griffith's) misses the point (it's pretty much the same as yours); I'm going to show explain the reasoning behind Andrew Tesker and Colin Barker's solutions.
In the original function binsearch
, inside the loop, there are three possible situations: x<v[mid]
, x==v[mid]
and x>v[mid]
. Sometimes, a comparison function has three outcomes: less than, equal or greater than; in such cases, the approach in the original function is best. For example, if we were comparing strings with strcmp
:
int cmp = strcmp(x, v[mid]);
if (cmp < 0) high = mid - 1;
else if (cmp > 0) low = mid + 1;
else return mid;
In other cases, the only primitive you have is a less-than or less-or-equal-to, and to test for equality requires a second comparison. Thus the method presented in K&R requires two calls to the comparison function per iteration through the loop. It is more efficient to require a single comparison. Now we absolutely need to distinguish between the less-than and the greater-than case, so this means we're not going to be able to deal with the equality case inside the loop. So we'll keep going either up or down in the equality case. Since we haven't vetted the value v[mid]
in one of the cases, we need to keep it inside the search space.
if (x < v[mid]) high = mid - 1;
else low = mid;
What happens to the loop termination? Now, we can get stuck with low == high
and x >= v[low]
. However, this is easily resolved by ending the loop when low == high
.
while (low < high) {
mid = (low + high + 1) / 2;
if (x < v[mid]) high = mid - 1;
else low = mid;
}
Note that I set mid
to low + high + 1
. In the case where low + 1 == high
, (low + high) / 2
is low
, and if x
is still too small we'd set mid
to low
and loop forever (thanks to Elyasin for noticing that my original answer had this bug). Instead in this case we arrange for mid
to be set to high
. If low <= high
we exit the loop anyway, and in other cases we'll continue looping and this change only causes the middle to be rounded up instead of down.
At the exit of the loop, if there is a matching element in the array, then v[low]
is equal to it. So we perform a final equality test:
return x == v[low] ? low : -1;
The number of element comparisons performed by the original version is anywhere from 1 to 2*ceil(log2(n)+1) where n is the number of elements in the array. The number of element comparisons in the version with a single comparison within the loop is always ceil(log2(n))+2. That's a smaller maximum, but not necessarily a gain, because the original version will terminate earlier if it finds the element. The original version is preferable when there is a good chance that the element will be in the array. The modified version is preferable when the element is rarely present.
-
\$\begingroup\$ Nice answer. I think there is a bug. I would use the assignment
mid = (low + high) / 2 + 1;
. \$\endgroup\$Ely– Ely2015年08月09日 08:21:16 +00:00Commented Aug 9, 2015 at 8:21 -
\$\begingroup\$ @Elyasin Why? What's the bug? \$\endgroup\$Gilles 'SO- stop being evil'– Gilles 'SO- stop being evil'2015年08月09日 10:30:46 +00:00Commented Aug 9, 2015 at 10:30
-
\$\begingroup\$ I think it loops infinitely, the reason being that the division by integers is truncated. Try to find the last element in a list. \$\endgroup\$Ely– Ely2015年08月10日 02:41:35 +00:00Commented Aug 10, 2015 at 2:41
-
\$\begingroup\$ @Elyasin Got it. But (low+high)/2+1 is wrong too: it can cause an access one past the end of the array when low==high. \$\endgroup\$Gilles 'SO- stop being evil'– Gilles 'SO- stop being evil'2015年08月10日 07:32:34 +00:00Commented Aug 10, 2015 at 7:32
-
\$\begingroup\$ P.S. Beware of bugs in the above code; I have only tried it, not proved it correct. \$\endgroup\$Gilles 'SO- stop being evil'– Gilles 'SO- stop being evil'2015年08月10日 07:42:40 +00:00Commented Aug 10, 2015 at 7:42
In binarysearch1
, you've replaced one test in the loop body with one test in the loop condition. That doesn't reduce the number of tests, and I agree with Gilles, binarysearch0
is clearer. I would recommend keeping that. However, in both searches, the calculation of mid = (low + high) / 2
can overflow if the array you're searching is large. Use an overflow-safe calculation instead. The readable one is mid = low + (high - low) / 2
(assuming you can guarantee that low
and high
are never negative, INT_MAX - INT_MIN
for example would overflow).
I would not waste any time at all worrying about factoring away one of the two comparisons, because a good compiler should be able to optimize them into one anyway. (Maybe during Kernighan and Ritchie's time compilers were not smart enough to pull such tricks, so maybe this question was meaningful back then, but believe me, they have gone a very long way since then!)
The compiler knows that x
and v[mid]
(or the result of calling the comparison function) are not changed by the if
statement, so it can emit the code which compares these two operands only once. The results of the comparison will be stored in the Carry
(C) and Zero
(Z) bits of the Flags
register of an x86 CPU, as follows: If Z is set, it means that the two operands of the comparison were equal; otherwise, if C is set, then the second operand was greater than the first; otherwise, (both flags are clear,) the second operand was smaller than the first. So, if the compiler is smart at all, after the comparison instruction it will emit two consecutive conditional jump instructions: one which will branch to the low = mid + 1
part if the result of the comparison was 'greater than', (Z=0, C=0,) immediately followed by another which will branch to the high = mid - 1
part if the result of the comparison was 'less than'. (Z=0, C=1.) A jump instruction never modifies the Flags
register of the CPU, so after the first jump instruction has examined the flags, and decided not to branch, the flags will still be intact for the second jump instruction to also examine them. And if the second jump instruction decides not to branch either, then the CPU will fall through to the code which will handle the last remaining case, where the operands were equal.
I would be willing to bet money that a decent C compiler will do this for you. Even if it turns out that it does not, the whole topic can nonetheless be dismissed as belonging to the general class of problems that compilers should be taking care of for us, so that we can spend our time thinking about more useful things.
binarysearch0
clearer thanbinarysearch1
, and here's no difference efficiency-wise. If this is a homework question, how exactly was it formulated? If not, what do you hope to gain withbinarysearch1
? \$\endgroup\$if (v[mid] < x){ low = mid+1; } else { high = mid; }
? \$\endgroup\$