Is there any way in which I can improve my code? I don't know if it is 100% right, and if it can be tweaked in order to be more efficient. That's why I'm posting here.
I'm open to any suggestions :)
#include <iostream>
using namespace std;
int binarySearch(int arr[], int low, int high, int n)
{
if (low <= high)
{
int i = (high + low) / 2;
if (arr[i] == n)
{
return i;
}
else
{
if (arr[i] > n)
{
return binarySearch(arr, low, i - 1, n);
}
if (arr[i] < n)
{
return binarySearch(arr, i + 1, high, n);
}
}
}
return -1;
}
int main()
{
int arr[10]{ 1, 2, 3, 4, 5, 6, 8 }, n = 7;
cout<<binarySearch(arr, 0, 6, 9);
return 0;
}
Thanks.
4 Answers 4
Avoid deeply nested if
-statements
Deeply nested if
-statements are hard to read. You can reduce the amount of nesting here by inverting the first if
-statement and returning early, and avoiding the else
-statement by using the fact that the if
part already returns:
int binarySearch(int arr[], int low, int high, int n)
{
if (low > high)
{
return -1;
}
int i = (high + low) / 2;
if (arr[i] == n)
{
return i;
}
if (arr[i] > n)
{
return binarySearch(arr, low, i - 1, n);
}
else
{
return binarySearch(arr, i + 1, high, n);
}
}
Use size_t
for array indices
When dealing with sizes, counts and indices, prefer using size_t
to hold their values, as that type is guaranteed to be big enough to cover any array that can be addressed by your computer.
size_t
is unsigned, so it might look like you can't return -1
, but it turns out that integer promotion rules make it work regardless: you can return -1
, and the caller can check if (binarySearch(...) == -1)
, although perhaps better would be to create a constant for this, similar to std::string::npos
.
Try to make it more generic
Your binary search algorithm works for plain arrays of integers, but it doesn't work for anything else. What if you want to search in an array of float
s? What if you want to search in a std::vector
? The standard library provides std::binary_search()
which can work on a variety of containers holding any type that can be compared. It might be good practice to try to make the interface to your binary search implementation similar to that of std::binary_search()
. It is not as hard as it looks! You can start by making it a template for arrays of different types:
template<typename T>
size_t binarySearch(T arr[], size_t low, size_t high, const T &n)
{
...
}
Once you have done that excercise, try to have it take two iterators, and return an iterator to the element if it's found, or last
if not (assuming last
now points to right after the end of the range to search):
template<typename It, typename T>
It binarySearch(It first, It last, const T &n)
{
...
}
This is a bit more advanced; you have to work within the limitations of the iterators.
Last you might want to add another template parameter, typename Comp
, so you can provide a custom comparison operator.
-
\$\begingroup\$ avoiding the
else
-statement by using the fact that theif
part already returns does that apply to the last conditional statement in the first revision of your answer, too? \$\endgroup\$greybeard– greybeard2020年11月23日 21:07:49 +00:00Commented Nov 23, 2020 at 21:07 -
2\$\begingroup\$ @greybeard You are right, but the last
else return ...
did not increase indentation levels, and in this case I think it better preserves the symmetry between the two branches. \$\endgroup\$G. Sliepen– G. Sliepen2020年11月23日 21:28:40 +00:00Commented Nov 23, 2020 at 21:28 -
\$\begingroup\$ Thanks for the great tips. I'll edit the code accordingly. \$\endgroup\$Octavian Niculescu– Octavian Niculescu2020年11月24日 07:17:20 +00:00Commented Nov 24, 2020 at 7:17
-
1\$\begingroup\$ @KonradRudolph I copied cppreference's variable names here. I guess it's a matter of preference, and at this point consistency is the most important factor to name things, so that either all your iterator variables are named
begin
/end
or all are namedfirst
/last
. \$\endgroup\$G. Sliepen– G. Sliepen2020年11月24日 14:39:07 +00:00Commented Nov 24, 2020 at 14:39 -
\$\begingroup\$ @G.Sliepen My bad, I confused them with
front
andback
. In fact, t’s not just cppreference: the standard itself uses these names for half-open iterator ranges, so this is entirely appropriate and even canonical. \$\endgroup\$Konrad Rudolph– Konrad Rudolph2020年11月24日 14:51:05 +00:00Commented Nov 24, 2020 at 14:51
Since nobody has mentioned it:
Your code contains a famous bug. The following code may overflow the int
range for large arrays:
int i = (high + low) / 2;
To avoid this, use the following:
int i = low + (high - low) / 2;
Note that simply using a different data type isn’t enough — it just postpones the bug to larger arrays.
Another, unrelated issue relates to code readability: in the parameter list to binarySearch
, int arr[]
looks like an array declaration but (because of C++ rules), this is actually a pointer, i.e. it’s equivalent to writing int* arr
. Using the int arr[]
syntax is potentially misleading, and therefore best avoided.
To further improve readability, use a different name for i
(what does i
stand for?). mid
is frequently used here.
-
\$\begingroup\$ Ha, I wanted to mention that bug but got too riled up about the unclear parameters and then forgot :-). In what way is
int arr[]
misleading, i.e., how might someone misunderstand it and what would be the problem? (I don't know what you mean with C++ rules.) \$\endgroup\$superb rain– superb rain2020年11月24日 14:10:01 +00:00Commented Nov 24, 2020 at 14:10 -
1\$\begingroup\$ @superbrain Arrays have subtly different semantics from pointers in C and C++. The most obviously problem is that
sizeof
behaves differently. They’re also simply distinct types, so they call different overloaded functions. But the most obvious problem is that thinking an array is passed here reinforces a wrong mental model of the C++ type system, and using this false array syntax instead of pointer syntax is therefore almost exclusively done in error by beginners who genuinely think they’re passing an array (or who think that arrays and pointers are the same thing). \$\endgroup\$Konrad Rudolph– Konrad Rudolph2020年11月24日 14:17:15 +00:00Commented Nov 24, 2020 at 14:17 -
1\$\begingroup\$ Put simpler, the function parameter type is a pointer, not an array. Using a "fake-array" syntax implies the wrong type. The fact that C++ even has this misleading syntax is a historic mistake. \$\endgroup\$Konrad Rudolph– Konrad Rudolph2020年11月24日 14:22:06 +00:00Commented Nov 24, 2020 at 14:22
-
5\$\begingroup\$ Good point, this issue is so common that C++20 introduced a function to deal with it:
std::midpoint()
. And an excellent talk to go with it here: youtube.com/watch?v=sBtAGxBh-XI \$\endgroup\$G. Sliepen– G. Sliepen2020年11月24日 14:32:09 +00:00Commented Nov 24, 2020 at 14:32 -
2\$\begingroup\$ @G.Sliepen Interesting. I wasn’t aware of this function. Unfortunately, as far as I can tell, this function is actually less efficient than
a + (b - a) / 2
(at least the implementation shown in the talk is!) — and in this particular scenario the latter works fine. I’m actually also confused that they settled for the less efficient implementation for theT*
overload. This looks like a QOI bug. \$\endgroup\$Konrad Rudolph– Konrad Rudolph2020年11月24日 15:13:23 +00:00Commented Nov 24, 2020 at 15:13
Should have some documentation saying what it does, i.e., what the parameters mean and what the result will be. Especially what high
means, as some people make it inclusive and others make it exclusive. It's also against C++ usual name last
and apparently against C++'s usual semantic of last
being exclusive. One shouldn't have to analyze the implementation code in order to find all that out.
You might not want to test arr[i] == n
first, as that's the least likely case, so giving it the fastest route (just one test instead of two or three) is rather an anti-optimization. Btw, why check <
after you already ruled out ==
and >
? Int pairs can't fail all three. And if you use this for other types where all three can fail (for example sets with subset/superset relationships), then... you probably shouldn't be doing that anyway.
-
\$\begingroup\$ To stress the indispensability of documentation (and in-code documentation gets separated/lost from the code least easily): suppose this is in a multi-developer project. One checks what happens when the range doesn't contain the key and relies on, say, result < 0. Another one finds it useful to get the least index of an item no smaller than the key (à la Python
bisect_left()
) and modifies the code accordingly - after all, found or not can be established with a single equality check, right? (wrong: return value may behigh
) \$\endgroup\$greybeard– greybeard2020年11月24日 04:46:45 +00:00Commented Nov 24, 2020 at 4:46 -
\$\begingroup\$ Can you give me an example of inclusive vs exclusive parameters in this DAC algorithm? The other part was clear. Thanks :) \$\endgroup\$Octavian Niculescu– Octavian Niculescu2020年11月24日 07:18:29 +00:00Commented Nov 24, 2020 at 7:18
-
1\$\begingroup\$ @OctavianNiculescu Not sure what you mean. Are you just not familiar with those words? Inclusive means the range [low,high] and exclusive means [low,high). For inclusive,
arr[high]
is part of the input, for exclusive it's not (the last element is insteadarr[high - 1]
). \$\endgroup\$superb rain– superb rain2020年11月24日 10:49:29 +00:00Commented Nov 24, 2020 at 10:49
My immediate reaction on reading this is "why is it recursive"? It has a bit of a feel for a recursive algorithm because it is divide and conquer, but there is really no need to do so here. What recursion provides is an easy way to manage the context of the intermediate parts. (For example, when sorting the array a recursive solution is good because you can solve the sub parts and then combine them, and the sub parts have distinct data that has to be managed.) But the data to be managed in binary search is trivial -- namely the lowest index of the part still being searched and the upper index of the part still being searched.
Consequently, in your binary search you are paying a large cost with all those recursive calls. Remember each call requires setting up a stack frame, copying all the parameters and initializing all the variables, plus for large arrays you could potentially use a lot of stack memory. (Not likely to be a problem on regular computers, but definitely in memory limited situations.)
Here is the solution given in wikipedia. It is, you will see, pretty straightforward. The state is managed on only two variables - L and R, the leftmost and rightmost indexes of the space still being searched. Obviously you'd have to translate it into C++, but the translation is pretty straightforward.
function binary_search(A, n, T) is
L := 0
R := n − 1
while L ≤ R do
m := floor((L + R) / 2)
if A[m] < T then
L := m + 1
else if A[m] > T then
R := m − 1
else:
return m
return unsuccessful
You might consider a simple example as to why recursion is the wrong approach here. Consider another potentially recursive problem -- calculating factorial. Factorial(n) is defined as all the integers between 1..n multiplied together. (So obviously it only works for integers and only works for numbers >=1.) You can define factorial like this:
function factorial(n) is
if n == 1 return 1;
else return n * factorial(n-1);
Is this correct? Yes, for sure but it is a terrible solution. All those extra function calls make it very expensive. You can instead simply define it as:
function factorial(n) is
int total = 1;
for(int i=2; i<=n; i++) total *= i;
return total;
Now in a for loop you don't have the huge cost of n function calls. This is because the function call's purpose is to store the intermediate state, and it isn't necessary. You can store it all in one variable.
FWIW, both the recursive factorial and your binary search algorithms are tail recursive, so I suppose some really aggressive optimizer might make the conversion automatically for you. But I doubt a C++ compiler could ever be that aggressive since it requires a lot of deep information about function side effects that is pretty hard to gather, and risky to assume. (If you were using a pure functional language, for example, you would give it as you have written, and the tail recursion would generally be optimized away.)
-
\$\begingroup\$ Your factorial example actually cannot do tail recursion; it needs to modify the return value from the recursive call before it can return itself. Note that tail recursion is quite a simple optimization, most C++ compilers will happily do that. It doesn't require deep knowledge about side effects, in fact side effects are perfectly fine to have with tail recursion. \$\endgroup\$G. Sliepen– G. Sliepen2020年11月25日 07:21:54 +00:00Commented Nov 25, 2020 at 7:21
-
\$\begingroup\$ Thanks a lot for your answer. I'll look into it \$\endgroup\$Octavian Niculescu– Octavian Niculescu2020年11月25日 07:48:27 +00:00Commented Nov 25, 2020 at 7:48
-
\$\begingroup\$ "why is it recursive?" — Because the algorithm is naturally expressed using recursion. There’s really absolutely nothing wrong with that. I don’t understand why you’re mentioning the factorial here at all. It’s completely unrelated to the question. Providing a terrible implementation is just generally not a good argument. For one, the recursive binary search is not terrible. For another, naturally recursive factorial implementations don’t have to be terrible. Yours happens to be, but a straightforward recursive implementation can have the same performance as an optimal, iterative version. \$\endgroup\$Konrad Rudolph– Konrad Rudolph2020年11月25日 12:19:03 +00:00Commented Nov 25, 2020 at 12:19
const
for two reasons: 1 readability, 2 compiler can make more assumptions if caller and this function are in separate code units. \$\endgroup\$