Here is an implementation of binary search. Please give feedback on any part but a few specific areas I was wondering about.
- Is
size_t
used appropriately or shouldint
orunsigned int
be used instead? - How can I shorten
selfTest()
and in general make it better? I know it's clunky how it returns a value and prints an error message tostd::cout
.
#include <iostream>
#include <cmath>
/*function prototypes*/
int selfTest();
int binarySearch(const int data[], const size_t length, const int findMe);
int main()
{
auto failed = selfTest();
std::cout << "failed tests: " << failed << std::endl;
return 0;
}
int binarySearch(const int data[], const size_t length, const int findMe)
{
size_t start = 0;
size_t end = length - 1;
while(start <= end)
{
const auto mid = start + ((end - start) / 2);
if(data[mid] < findMe)
{
start = mid + 1;
}
else if(data[mid] > findMe)
{
end = mid - 1;
}
else
{
return mid;
}
}
return -1;
}
int selfTest()
{
auto failed = 0;
int test1[7] = {1,2,3,4,5,6,7};
int test2[8] = {1,2,3,4,5,6,7,8};
int test3[7] = {1,2,3,4,5,7,7};
int test4[8] = {1,2,3,4,5,7,8,8};
int test5[3] = {1,2,3};
int test6[3] = {5,5,5};
int test7[5] = {-10, -2, 5, 6, 10};
int result;
result = binarySearch(test1, 7, 4);
if(test1[result] != 4)
{
std::cout << "Test A failed. Expected 2, got " << result << std::endl;
failed++;
}
result = binarySearch(test1, 7, 1);
if(result != 0)
{
std::cout << "Test B failed. Expected 0, got " << result << std::endl;
failed++;
}
result = binarySearch(test1, 7, 2);
if(result != 1)
{
std::cout << "Test C failed. Expected 1, got " << result << std::endl;
failed++;
}
result = binarySearch(test2, 8, 5);
if(result != 4)
{
std::cout << "Test D failed. Expected 4, got " << result << std::endl;
failed++;
}
result = binarySearch(test3, 7, 7);
if(test3[result] != 7)
{
std::cout << "Test E failed. Expected 5, got " << result << std::endl;
failed++;
}
result = binarySearch(test4, 8, 8);
if(result != 6 && result != 7)
{
std::cout << "Test F failed. Expected 6 or 7, got " << result << std::endl;
failed++;
}
result = binarySearch(test5, 3, 1);
if(result != 0)
{
std::cout << "Test G failed. Expected 0, got " << result << std::endl;
failed++;
}
result = binarySearch(test6, 3, 5);
if(result != 0 && result != 1 && result != 2)
{
std::cout << "Test H failed. Expected 0 or 1, got " << result << std::endl;
failed++;
}
result = binarySearch(test6, 3, 0);
if(result != -1)
{
std::cout << "Test I failed. Expected -1, got " << result << std::endl;
failed++;
}
result = binarySearch(test7, 5, -10);
if(test7[result] != -10)
{
std::cout << "Test J failed. Expected 0, got " << result << std::endl;
failed++;
}
result = binarySearch(test7, 5, -2);
if(test7[result] != -2)
{
std::cout << "Test K failed. Expected 1, got " << result << std::endl;
failed++;
}
result = binarySearch(test7, 5, 6);
if(test7[result] != 6)
{
std::cout << "Test L failed. Expected 3, got " << result << std::endl;
failed++;
}
return failed;
}
2 Answers 2
/*function prototypes*/
A C comment, bearing C nomenclature?
C++ doesn't have "prototypes". Declaring your functions at the top is not helpful since C++ supports overloading, and any difference between this declaration and the later definition will not cause any error at compile time but a confusing link-time error.
Just define the functions in the right order. Put main
at the bottom, and define the function before you call it.
int binarySearch(const int data[], const size_t length, const int findMe)
⧺I.13 Do not pass an array as a single pointer (includes pointer and count parameters in the discussion)
⧺R.14 Avoid
[]
parameters, preferspan
⧺F.24 Use a
span<T>
or aspan_p<T>
to designate a half-open sequence
It would be more normal to do this how the standard library functions do it: take a pair of iterators, not a starting position and length.
If you do pass a contiguous sequence of values in memory rather than being more general, don't use []
as an alias for pointer (what it means in a parameter list), and don't pass the starting position and length as separate parameters.
I see you used top-level const
on the length
and findMe
parameters. That's fine (though often omitted), but also showed them on the function declaration at the top of the file. If you do have a separate function declaration (say, in a header file), note that the top-level const
is not part of the contract and does not affect the caller. Its presence or absence in the definition does not change the function's signature. So, where you only declare the function, don't include such top-level const
modifiers. Where you define the function, you can use them as befits the implementation of the function.
⧺SL.io.50 Don't use endl
.
The Good
I want to acknowledge some of the good practices and decisions found in the code as well.
You made the search a separate function rather than stuffing everything inside
main
, and made it reusable for the real use and testing. That's something I'm often trying to get across to newcomers, and it seems to be missing from so many class curriculums!You declared
mid
locally inside the loop, where needed, and made itconst
. Again, that's something I'm always trying to impress upon students.Use of
const
in general. This deserves double bonus points.
test code structure
The test code is highly repetitive. You should not repeat the same boilerplate over and over. Rather, put the data in an array and loop through it.
const auto result = binarySearch(data.arr, data.findme);
if(data.arr[result] != data.findme)
{
std::cout << .... ;
failed++;
}
You can make an array of simple structures and use a range-based for
loop to go through it.
constexpr int TestArr1[] = {1,2,3,4,5,6,7};
constexpr int TestArr2[] = {1,2,2,2,2,3};
constexpr struct {
std::span<int> arr;
int findme;
} data[] = {
{ TestArr1, 3 },
⋮
};
You can also look at testing frameworks, such as Catch2.
-
\$\begingroup\$ "C++ doesn't have "prototypes" what makes you say this? I see lots of places that use the word "prototype" with C++. \$\endgroup\$northerner– northerner2021年12月06日 12:18:40 +00:00Commented Dec 6, 2021 at 12:18
-
\$\begingroup\$ @northerner not in the ISO standard. It's a C term, introduced to distinguish legacy function declarations from the new style that tells the compiler what to expect. In C++, it only ever had declarations that include the parameters. Someone using that word with respect to C++ means either "function declaration" or "signature" or something else not clearly communicated. \$\endgroup\$JDługosz– JDługosz2021年12月06日 15:01:16 +00:00Commented Dec 6, 2021 at 15:01
-
\$\begingroup\$ @northerner and of course it's never used in the original Strustrup C++ book, the Annotated Reference, and other early books, as they predate ANSI C. \$\endgroup\$JDługosz– JDługosz2021年12月06日 15:03:18 +00:00Commented Dec 6, 2021 at 15:03
size_t
is in the std
namespace, so should be written std::size_t
.
We don't need to include <cmath>
.
main()
should return non-zero if any of the tests failed.
Length+size is one way to pass a collection (and yes, std::size_t
is the appropriate type for the size here); a more flexible (but more advanced) way is to write a general template that accepts a C++ Range object.
If we're returning an index, that also should be a std::size_t
, as int
might not be sufficient. However, that leaves a problem - we can no longer use -1
to indicate that the value is before the 0th element of the array. Our options for that case include:
- return
(std::size_t)-1
- add 1 to the result (so we return the position after the found position, and caller must subtract 1 to use it)
- return the index of the first element greater or equal to target (caller can check if it's equal or not) - this is the convention used by
std::lower_bound()
- throw an exception (this is expensive, so not a good choice for a "normal" result)
- return a
std::optional
.
I recommend the third option, for the reason I've highlighted.
const auto mid = start + ((end - start) / 2);
That's good - you've avoided a common mistake there. (If we were to write (start + end) / 2
, we would be at risk of integer overflow).
As you observed, the test cases are very repetitive. We can extract the common parts to a separate function. My function looks something like this (most of the complexity is in giving good diagnostic output):
#include <vector>
static int test_binary_search(const std::vector<int>& v,
int target,
std::size_t min_result,
std::size_t max_result = 0)
{
if (max_result == 0) {
// fill in optional argument
max_result = min_result;
}
auto const result = binarySearch(v.data(), v.size(), target);
if (min_result <= result && result <= max_result) {
// test passed
return 0;
}
// We failed - emit some diagnostics
std::cerr << "Test failed: search for " << target
<< " in {";
auto *sep = "";
for (auto i: v) {
std::cerr << sep << i;
sep = ",";
}
std::cerr << "} returned " << result
<< "; expected " << min_result;
if (min_result != max_result) {
std::cerr << "-" << max_result;
}
std::cerr << '\n';
return 1;
}
Then it's very easy to use:
int selfTest()
{
return test_binary_search({1,2,3,4,5,6,7}, 4, 3)
+ test_binary_search({1,2,3,4,5,6,7}, 1, 0)
+ test_binary_search({1,2,3,4,5,6,7}, 7, 6)
+ test_binary_search({1,2,2,2,2,3}, 2, 1, 4)
+ test_binary_search({-10, -2, 5, 6, 10}, -2, 1);
}
A couple of things I fixed in passing: error messages should go to std::cerr
rather than std::cout
, and prefer plain newline (\n
) rather than std::endl
(which also flushes the output stream).
Having the test function print the inputs and expected result range helps avoid inconsistencies like this one:
result = binarySearch(test1, 7, 4); if(test1[result] != 4) { std::cout << "Test A failed. Expected 2, got " << result << std::endl; failed++; }
(In fact, almost all the tests have wrong "Expected" value in the string).
-
\$\begingroup\$ Regarding the "that leaves a problem": If the value isn't found, would throwing an exception be a good solution? \$\endgroup\$northerner– northerner2021年11月05日 09:48:25 +00:00Commented Nov 5, 2021 at 9:48
-
\$\begingroup\$ Good point - that's another option. I'll edit that in. \$\endgroup\$Toby Speight– Toby Speight2021年11月05日 10:29:13 +00:00Commented Nov 5, 2021 at 10:29
-
\$\begingroup\$ This made me wonder, any time an array is being iterated should the type be
std::size_t
orunsigned int
? \$\endgroup\$northerner– northerner2021年11月12日 07:38:50 +00:00Commented Nov 12, 2021 at 7:38 -
\$\begingroup\$ I'd always recommend
std::size_t
(unless there's a good reason we can guarantee that the result will always be no more thanUINT_MAX
- which could be as small as 65535). \$\endgroup\$Toby Speight– Toby Speight2021年11月12日 07:56:44 +00:00Commented Nov 12, 2021 at 7:56 -
\$\begingroup\$ Am I including the correct libraries for
std::size_t
? \$\endgroup\$northerner– northerner2021年12月06日 12:40:59 +00:00Commented Dec 6, 2021 at 12:40