Writing my first binary search by myself

Question 1

Is there any way in which I can improve my code? I don't know if it is 100% right, and if it can be tweaked in order to be more efficient. That's why I'm posting here.

I'm open to any suggestions :)

#include <iostream>
using namespace std;
int binarySearch(int arr[], int low, int high, int n)
{
 if (low <= high)
 {
 int i = (high + low) / 2;
 if (arr[i] == n)
 {
 return i;
 }
 else
 {
 if (arr[i] > n)
 {
 return binarySearch(arr, low, i - 1, n);
 }
 if (arr[i] < n)
 {
 return binarySearch(arr, i + 1, high, n);
 }
 }
 } 
 return -1;
}
int main()
{
 int arr[10]{ 1, 2, 3, 4, 5, 6, 8 }, n = 7;
 cout<<binarySearch(arr, 0, 6, 9);
 return 0;
}

Thanks.

Question 2

Improve the way you think about divide&conquer: multiple (typical: all) subproblems are solved, the overall solution is build from their solutions. Tends to be an improvement when effort grows faster than problem size and building a solutions from part-solutions is fast.

Question 3

I'd also declare the array const for two reasons: 1 readability, 2 compiler can make more assumptions if caller and this function are in separate code units.

Question 4

Avoid deeply nested `if`-statements

Deeply nested if-statements are hard to read. You can reduce the amount of nesting here by inverting the first if-statement and returning early, and avoiding the else-statement by using the fact that the if part already returns:

int binarySearch(int arr[], int low, int high, int n)
{
 if (low > high)
 {
 return -1;
 }
 
 int i = (high + low) / 2;
 if (arr[i] == n)
 {
 return i;
 }
 if (arr[i] > n)
 {
 return binarySearch(arr, low, i - 1, n);
 }
 else
 {
 return binarySearch(arr, i + 1, high, n);
 }
}

Use `size_t` for array indices

When dealing with sizes, counts and indices, prefer using size_t to hold their values, as that type is guaranteed to be big enough to cover any array that can be addressed by your computer.

size_t is unsigned, so it might look like you can't return -1, but it turns out that integer promotion rules make it work regardless: you can return -1, and the caller can check if (binarySearch(...) == -1), although perhaps better would be to create a constant for this, similar to std::string::npos.

Try to make it more generic

Your binary search algorithm works for plain arrays of integers, but it doesn't work for anything else. What if you want to search in an array of floats? What if you want to search in a std::vector? The standard library provides std::binary_search() which can work on a variety of containers holding any type that can be compared. It might be good practice to try to make the interface to your binary search implementation similar to that of std::binary_search(). It is not as hard as it looks! You can start by making it a template for arrays of different types:

template<typename T>
size_t binarySearch(T arr[], size_t low, size_t high, const T &n)
{
 ...
}

Once you have done that excercise, try to have it take two iterators, and return an iterator to the element if it's found, or last if not (assuming last now points to right after the end of the range to search):

template<typename It, typename T>
It binarySearch(It first, It last, const T &n)
{
 ...
}

This is a bit more advanced; you have to work within the limitations of the iterators. Last you might want to add another template parameter, typename Comp, so you can provide a custom comparison operator.

Question 5

avoiding the else-statement by using the fact that the if part already returns does that apply to the last conditional statement in the first revision of your answer, too?

Question 6

@greybeard You are right, but the last else return ... did not increase indentation levels, and in this case I think it better preserves the symmetry between the two branches.

Question 7

Thanks for the great tips. I'll edit the code accordingly.

Question 8

@KonradRudolph I copied cppreference's variable names here. I guess it's a matter of preference, and at this point consistency is the most important factor to name things, so that either all your iterator variables are named begin/end or all are named first/last.

Question 9

@G.Sliepen My bad, I confused them with front and back. In fact, t’s not just cppreference: the standard itself uses these names for half-open iterator ranges, so this is entirely appropriate and even canonical.

Question 10

Since nobody has mentioned it:

Your code contains a famous bug. The following code may overflow the int range for large arrays:

int i = (high + low) / 2;

To avoid this, use the following:

int i = low + (high - low) / 2;

Note that simply using a different data type isn’t enough — it just postpones the bug to larger arrays.

Another, unrelated issue relates to code readability: in the parameter list to binarySearch, int arr[] looks like an array declaration but (because of C++ rules), this is actually a pointer, i.e. it’s equivalent to writing int* arr. Using the int arr[] syntax is potentially misleading, and therefore best avoided.

To further improve readability, use a different name for i (what does i stand for?). mid is frequently used here.

Question 11

Ha, I wanted to mention that bug but got too riled up about the unclear parameters and then forgot :-). In what way is int arr[] misleading, i.e., how might someone misunderstand it and what would be the problem? (I don't know what you mean with C++ rules.)

Question 12

@superbrain Arrays have subtly different semantics from pointers in C and C++. The most obviously problem is that sizeof behaves differently. They’re also simply distinct types, so they call different overloaded functions. But the most obvious problem is that thinking an array is passed here reinforces a wrong mental model of the C++ type system, and using this false array syntax instead of pointer syntax is therefore almost exclusively done in error by beginners who genuinely think they’re passing an array (or who think that arrays and pointers are the same thing).

Question 13

Put simpler, the function parameter type is a pointer, not an array. Using a "fake-array" syntax implies the wrong type. The fact that C++ even has this misleading syntax is a historic mistake.

Question 14

Good point, this issue is so common that C++20 introduced a function to deal with it: std::midpoint(). And an excellent talk to go with it here: youtube.com/watch?v=sBtAGxBh-XI

Question 15

@G.Sliepen Interesting. I wasn’t aware of this function. Unfortunately, as far as I can tell, this function is actually less efficient than a + (b - a) / 2 (at least the implementation shown in the talk is!) — and in this particular scenario the latter works fine. I’m actually also confused that they settled for the less efficient implementation for the T* overload. This looks like a QOI bug.

Question 16

Should have some documentation saying what it does, i.e., what the parameters mean and what the result will be. Especially what high means, as some people make it inclusive and others make it exclusive. It's also against C++ usual name last and apparently against C++'s usual semantic of last being exclusive. One shouldn't have to analyze the implementation code in order to find all that out.

You might not want to test arr[i] == n first, as that's the least likely case, so giving it the fastest route (just one test instead of two or three) is rather an anti-optimization. Btw, why check < after you already ruled out == and >? Int pairs can't fail all three. And if you use this for other types where all three can fail (for example sets with subset/superset relationships), then... you probably shouldn't be doing that anyway.

Question 17

To stress the indispensability of documentation (and in-code documentation gets separated/lost from the code least easily): suppose this is in a multi-developer project. One checks what happens when the range doesn't contain the key and relies on, say, result < 0. Another one finds it useful to get the least index of an item no smaller than the key (à la Python bisect_left()) and modifies the code accordingly - after all, found or not can be established with a single equality check, right? (wrong: return value may be high)

Question 18

Can you give me an example of inclusive vs exclusive parameters in this DAC algorithm? The other part was clear. Thanks :)

Question 19

@OctavianNiculescu Not sure what you mean. Are you just not familiar with those words? Inclusive means the range [low,high] and exclusive means [low,high). For inclusive, arr[high] is part of the input, for exclusive it's not (the last element is instead arr[high - 1]).

Question 20

My immediate reaction on reading this is "why is it recursive"? It has a bit of a feel for a recursive algorithm because it is divide and conquer, but there is really no need to do so here. What recursion provides is an easy way to manage the context of the intermediate parts. (For example, when sorting the array a recursive solution is good because you can solve the sub parts and then combine them, and the sub parts have distinct data that has to be managed.) But the data to be managed in binary search is trivial -- namely the lowest index of the part still being searched and the upper index of the part still being searched.

Consequently, in your binary search you are paying a large cost with all those recursive calls. Remember each call requires setting up a stack frame, copying all the parameters and initializing all the variables, plus for large arrays you could potentially use a lot of stack memory. (Not likely to be a problem on regular computers, but definitely in memory limited situations.)

Here is the solution given in wikipedia. It is, you will see, pretty straightforward. The state is managed on only two variables - L and R, the leftmost and rightmost indexes of the space still being searched. Obviously you'd have to translate it into C++, but the translation is pretty straightforward.

function binary_search(A, n, T) is
 L := 0
 R := n − 1
 while L ≤ R do
 m := floor((L + R) / 2)
 if A[m] < T then
 L := m + 1
 else if A[m] > T then
 R := m − 1
 else:
 return m
 return unsuccessful

You might consider a simple example as to why recursion is the wrong approach here. Consider another potentially recursive problem -- calculating factorial. Factorial(n) is defined as all the integers between 1..n multiplied together. (So obviously it only works for integers and only works for numbers >=1.) You can define factorial like this:

function factorial(n) is
 if n == 1 return 1;
 else return n * factorial(n-1);

Is this correct? Yes, for sure but it is a terrible solution. All those extra function calls make it very expensive. You can instead simply define it as:

function factorial(n) is
 int total = 1;
 for(int i=2; i<=n; i++) total *= i;
 return total;

Now in a for loop you don't have the huge cost of n function calls. This is because the function call's purpose is to store the intermediate state, and it isn't necessary. You can store it all in one variable.

FWIW, both the recursive factorial and your binary search algorithms are tail recursive, so I suppose some really aggressive optimizer might make the conversion automatically for you. But I doubt a C++ compiler could ever be that aggressive since it requires a lot of deep information about function side effects that is pretty hard to gather, and risky to assume. (If you were using a pure functional language, for example, you would give it as you have written, and the tail recursion would generally be optimized away.)

Question 21

Your factorial example actually cannot do tail recursion; it needs to modify the return value from the recursive call before it can return itself. Note that tail recursion is quite a simple optimization, most C++ compilers will happily do that. It doesn't require deep knowledge about side effects, in fact side effects are perfectly fine to have with tail recursion.

Question 22

Thanks a lot for your answer. I'll look into it

Question 23

"why is it recursive?" — Because the algorithm is naturally expressed using recursion. There’s really absolutely nothing wrong with that. I don’t understand why you’re mentioning the factorial here at all. It’s completely unrelated to the question. Providing a terrible implementation is just generally not a good argument. For one, the recursive binary search is not terrible. For another, naturally recursive factorial implementations don’t have to be terrible. Yours happens to be, but a straightforward recursive implementation can have the same performance as an optimal, iterative version.

G. Sliepen G. Sliepen 68.9k3 gold badges74 silver badges179 bronze badges · Accepted Answer · 2020-11-23 20:55:59Z

Avoid deeply nested `if`-statements

Deeply nested if-statements are hard to read. You can reduce the amount of nesting here by inverting the first if-statement and returning early, and avoiding the else-statement by using the fact that the if part already returns:

int binarySearch(int arr[], int low, int high, int n)
{
 if (low > high)
 {
 return -1;
 }
 
 int i = (high + low) / 2;
 if (arr[i] == n)
 {
 return i;
 }
 if (arr[i] > n)
 {
 return binarySearch(arr, low, i - 1, n);
 }
 else
 {
 return binarySearch(arr, i + 1, high, n);
 }
}

Use `size_t` for array indices

When dealing with sizes, counts and indices, prefer using size_t to hold their values, as that type is guaranteed to be big enough to cover any array that can be addressed by your computer.

size_t is unsigned, so it might look like you can't return -1, but it turns out that integer promotion rules make it work regardless: you can return -1, and the caller can check if (binarySearch(...) == -1), although perhaps better would be to create a constant for this, similar to std::string::npos.

Try to make it more generic

Your binary search algorithm works for plain arrays of integers, but it doesn't work for anything else. What if you want to search in an array of floats? What if you want to search in a std::vector? The standard library provides std::binary_search() which can work on a variety of containers holding any type that can be compared. It might be good practice to try to make the interface to your binary search implementation similar to that of std::binary_search(). It is not as hard as it looks! You can start by making it a template for arrays of different types:

template<typename T>
size_t binarySearch(T arr[], size_t low, size_t high, const T &n)
{
 ...
}

Once you have done that excercise, try to have it take two iterators, and return an iterator to the element if it's found, or last if not (assuming last now points to right after the end of the range to search):

template<typename It, typename T>
It binarySearch(It first, It last, const T &n)
{
 ...
}

This is a bit more advanced; you have to work within the limitations of the iterators. Last you might want to add another template parameter, typename Comp, so you can provide a custom comparison operator.

avoiding the else-statement by using the fact that the if part already returns does that apply to the last conditional statement in the first revision of your answer, too?
@greybeard You are right, but the last else return ... did not increase indentation levels, and in this case I think it better preserves the symmetry between the two branches.
@KonradRudolph I copied cppreference's variable names here. I guess it's a matter of preference, and at this point consistency is the most important factor to name things, so that either all your iterator variables are named begin/end or all are named first/last.
@G.Sliepen My bad, I confused them with front and back. In fact, t’s not just cppreference: the standard itself uses these names for half-open iterator ranges, so this is entirely appropriate and even canonical.

Stack Exchange Network

Writing my first binary search by myself

4 Answers 4

Avoid deeply nested `if`-statements

Use `size_t` for array indices

Try to make it more generic

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Writing my first binary search by myself

4 Answers 4

Avoid deeply nested if-statements

Use size_t for array indices

Try to make it more generic

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Avoid deeply nested `if`-statements

Use `size_t` for array indices