Binary search with self-test

Question 1

Here is an implementation of binary search. Please give feedback on any part but a few specific areas I was wondering about.

Is size_t used appropriately or should int or unsigned int be used instead?
How can I shorten selfTest() and in general make it better? I know it's clunky how it returns a value and prints an error message to std::cout.

#include <iostream>
#include <cmath>
/*function prototypes*/
int selfTest();
int binarySearch(const int data[], const size_t length, const int findMe);
int main()
{
 auto failed = selfTest();
 std::cout << "failed tests: " << failed << std::endl;
 return 0;
}
int binarySearch(const int data[], const size_t length, const int findMe)
{
 size_t start = 0;
 size_t end = length - 1;
 while(start <= end)
 {
 const auto mid = start + ((end - start) / 2);
 if(data[mid] < findMe)
 {
 start = mid + 1;
 }
 else if(data[mid] > findMe)
 {
 end = mid - 1;
 }
 else
 {
 return mid;
 }
 }
 
 
 return -1;
}
int selfTest()
{
 auto failed = 0;
 
 int test1[7] = {1,2,3,4,5,6,7};
 int test2[8] = {1,2,3,4,5,6,7,8};
 int test3[7] = {1,2,3,4,5,7,7};
 int test4[8] = {1,2,3,4,5,7,8,8};
 int test5[3] = {1,2,3};
 int test6[3] = {5,5,5};
 int test7[5] = {-10, -2, 5, 6, 10};
 
 int result;
 result = binarySearch(test1, 7, 4);
 if(test1[result] != 4)
 {
 std::cout << "Test A failed. Expected 2, got " << result << std::endl;
 failed++;
 }
 
 result = binarySearch(test1, 7, 1);
 if(result != 0)
 {
 std::cout << "Test B failed. Expected 0, got " << result << std::endl;
 failed++;
 }
 
 result = binarySearch(test1, 7, 2);
 if(result != 1)
 {
 std::cout << "Test C failed. Expected 1, got " << result << std::endl;
 failed++;
 }
 
 result = binarySearch(test2, 8, 5);
 if(result != 4)
 {
 std::cout << "Test D failed. Expected 4, got " << result << std::endl;
 failed++;
 }
 
 result = binarySearch(test3, 7, 7);
 if(test3[result] != 7)
 {
 std::cout << "Test E failed. Expected 5, got " << result << std::endl;
 failed++;
 }
 
 result = binarySearch(test4, 8, 8);
 if(result != 6 && result != 7)
 {
 std::cout << "Test F failed. Expected 6 or 7, got " << result << std::endl;
 failed++;
 }
 
 result = binarySearch(test5, 3, 1);
 if(result != 0)
 {
 std::cout << "Test G failed. Expected 0, got " << result << std::endl;
 failed++;
 }
 
 result = binarySearch(test6, 3, 5);
 if(result != 0 && result != 1 && result != 2)
 {
 std::cout << "Test H failed. Expected 0 or 1, got " << result << std::endl;
 failed++;
 }
 
 result = binarySearch(test6, 3, 0);
 if(result != -1)
 {
 std::cout << "Test I failed. Expected -1, got " << result << std::endl;
 failed++;
 }
 
 result = binarySearch(test7, 5, -10);
 if(test7[result] != -10)
 {
 std::cout << "Test J failed. Expected 0, got " << result << std::endl;
 failed++;
 }
 
 result = binarySearch(test7, 5, -2);
 if(test7[result] != -2)
 {
 std::cout << "Test K failed. Expected 1, got " << result << std::endl;
 failed++;
 }
 
 result = binarySearch(test7, 5, 6);
 if(test7[result] != 6)
 {
 std::cout << "Test L failed. Expected 3, got " << result << std::endl;
 failed++;
 }
 
 
 
 
 
 return failed;
}

Question 2

/*function prototypes*/
A C comment, bearing C nomenclature?
C++ doesn't have "prototypes". Declaring your functions at the top is not helpful since C++ supports overloading, and any difference between this declaration and the later definition will not cause any error at compile time but a confusing link-time error.

Just define the functions in the right order. Put main at the bottom, and define the function before you call it.

int binarySearch(const int data[], const size_t length, const int findMe)

⧺I.13 Do not pass an array as a single pointer (includes pointer and count parameters in the discussion)
⧺R.14 Avoid [] parameters, prefer span
⧺F.24 Use a span<T> or a span_p<T> to designate a half-open sequence

It would be more normal to do this how the standard library functions do it: take a pair of iterators, not a starting position and length.

If you do pass a contiguous sequence of values in memory rather than being more general, don't use [] as an alias for pointer (what it means in a parameter list), and don't pass the starting position and length as separate parameters.

I see you used top-level const on the length and findMe parameters. That's fine (though often omitted), but also showed them on the function declaration at the top of the file. If you do have a separate function declaration (say, in a header file), note that the top-level const is not part of the contract and does not affect the caller. Its presence or absence in the definition does not change the function's signature. So, where you only declare the function, don't include such top-level const modifiers. Where you define the function, you can use them as befits the implementation of the function.

⧺SL.io.50 Don't use endl.

The Good

I want to acknowledge some of the good practices and decisions found in the code as well.

You made the search a separate function rather than stuffing everything inside main, and made it reusable for the real use and testing. That's something I'm often trying to get across to newcomers, and it seems to be missing from so many class curriculums!
You declared mid locally inside the loop, where needed, and made it const. Again, that's something I'm always trying to impress upon students.
Use of const in general. This deserves double bonus points.

test code structure

The test code is highly repetitive. You should not repeat the same boilerplate over and over. Rather, put the data in an array and loop through it.

 const auto result = binarySearch(data.arr, data.findme);
 if(data.arr[result] != data.findme)
 {
 std::cout << .... ;
 failed++;
 }

You can make an array of simple structures and use a range-based for loop to go through it.

constexpr int TestArr1[] = {1,2,3,4,5,6,7};
constexpr int TestArr2[] = {1,2,2,2,2,3};
constexpr struct {
 std::span<int> arr;
 int findme;
 } data[] = {
 { TestArr1, 3 },
 ⋮
};

You can also look at testing frameworks, such as Catch2.

Question 3

"C++ doesn't have "prototypes" what makes you say this? I see lots of places that use the word "prototype" with C++.

Question 4

@northerner not in the ISO standard. It's a C term, introduced to distinguish legacy function declarations from the new style that tells the compiler what to expect. In C++, it only ever had declarations that include the parameters. Someone using that word with respect to C++ means either "function declaration" or "signature" or something else not clearly communicated.

Question 5

@northerner and of course it's never used in the original Strustrup C++ book, the Annotated Reference, and other early books, as they predate ANSI C.

Question 6

size_t is in the std namespace, so should be written std::size_t.

We don't need to include <cmath>.

main() should return non-zero if any of the tests failed.

Length+size is one way to pass a collection (and yes, std::size_t is the appropriate type for the size here); a more flexible (but more advanced) way is to write a general template that accepts a C++ Range object.

If we're returning an index, that also should be a std::size_t, as int might not be sufficient. However, that leaves a problem - we can no longer use -1 to indicate that the value is before the 0th element of the array. Our options for that case include:

return (std::size_t)-1
add 1 to the result (so we return the position after the found position, and caller must subtract 1 to use it)
return the index of the first element greater or equal to target (caller can check if it's equal or not) - this is the convention used by std::lower_bound()
throw an exception (this is expensive, so not a good choice for a "normal" result)
return a std::optional.

I recommend the third option, for the reason I've highlighted.

 const auto mid = start + ((end - start) / 2);

That's good - you've avoided a common mistake there. (If we were to write (start + end) / 2, we would be at risk of integer overflow).

As you observed, the test cases are very repetitive. We can extract the common parts to a separate function. My function looks something like this (most of the complexity is in giving good diagnostic output):

#include <vector>
static int test_binary_search(const std::vector<int>& v,
 int target,
 std::size_t min_result,
 std::size_t max_result = 0)
{
 if (max_result == 0) {
 // fill in optional argument
 max_result = min_result;
 }
 auto const result = binarySearch(v.data(), v.size(), target);
 if (min_result <= result && result <= max_result) {
 // test passed
 return 0;
 }
 // We failed - emit some diagnostics
 std::cerr << "Test failed: search for " << target
 << " in {";
 auto *sep = "";
 for (auto i: v) {
 std::cerr << sep << i;
 sep = ",";
 }
 std::cerr << "} returned " << result
 << "; expected " << min_result;
 if (min_result != max_result) {
 std::cerr << "-" << max_result;
 }
 std::cerr << '\n';
 return 1;
}

Then it's very easy to use:

int selfTest()
{
 return test_binary_search({1,2,3,4,5,6,7}, 4, 3)
 + test_binary_search({1,2,3,4,5,6,7}, 1, 0)
 + test_binary_search({1,2,3,4,5,6,7}, 7, 6)
 + test_binary_search({1,2,2,2,2,3}, 2, 1, 4)
 + test_binary_search({-10, -2, 5, 6, 10}, -2, 1);
}

A couple of things I fixed in passing: error messages should go to std::cerr rather than std::cout, and prefer plain newline (\n) rather than std::endl (which also flushes the output stream).

Having the test function print the inputs and expected result range helps avoid inconsistencies like this one:

result = binarySearch(test1, 7, 4);
if(test1[result] != 4)
{
 std::cout << "Test A failed. Expected 2, got " << result << std::endl;
 failed++;
}

(In fact, almost all the tests have wrong "Expected" value in the string).

Question 7

Regarding the "that leaves a problem": If the value isn't found, would throwing an exception be a good solution?

Question 8

Good point - that's another option. I'll edit that in.

Question 9

This made me wonder, any time an array is being iterated should the type be std::size_t or unsigned int?

Question 10

I'd always recommend std::size_t (unless there's a good reason we can guarantee that the result will always be no more than UINT_MAX - which could be as small as 65535).

Question 11

Am I including the correct libraries for std::size_t ?

JDługosz JDługoszJDługosz 11.7k18 silver badges40 bronze badges · Answer 1 · 2021-11-02 15:15:29Z

/*function prototypes*/
A C comment, bearing C nomenclature?
C++ doesn't have "prototypes". Declaring your functions at the top is not helpful since C++ supports overloading, and any difference between this declaration and the later definition will not cause any error at compile time but a confusing link-time error.

Just define the functions in the right order. Put main at the bottom, and define the function before you call it.

int binarySearch(const int data[], const size_t length, const int findMe)

⧺I.13 Do not pass an array as a single pointer (includes pointer and count parameters in the discussion)
⧺R.14 Avoid [] parameters, prefer span
⧺F.24 Use a span<T> or a span_p<T> to designate a half-open sequence

It would be more normal to do this how the standard library functions do it: take a pair of iterators, not a starting position and length.

If you do pass a contiguous sequence of values in memory rather than being more general, don't use [] as an alias for pointer (what it means in a parameter list), and don't pass the starting position and length as separate parameters.

I see you used top-level const on the length and findMe parameters. That's fine (though often omitted), but also showed them on the function declaration at the top of the file. If you do have a separate function declaration (say, in a header file), note that the top-level const is not part of the contract and does not affect the caller. Its presence or absence in the definition does not change the function's signature. So, where you only declare the function, don't include such top-level const modifiers. Where you define the function, you can use them as befits the implementation of the function.

⧺SL.io.50 Don't use endl.

The Good

I want to acknowledge some of the good practices and decisions found in the code as well.

You made the search a separate function rather than stuffing everything inside main, and made it reusable for the real use and testing. That's something I'm often trying to get across to newcomers, and it seems to be missing from so many class curriculums!
You declared mid locally inside the loop, where needed, and made it const. Again, that's something I'm always trying to impress upon students.
Use of const in general. This deserves double bonus points.

test code structure

The test code is highly repetitive. You should not repeat the same boilerplate over and over. Rather, put the data in an array and loop through it.

 const auto result = binarySearch(data.arr, data.findme);
 if(data.arr[result] != data.findme)
 {
 std::cout << .... ;
 failed++;
 }

You can make an array of simple structures and use a range-based for loop to go through it.

constexpr int TestArr1[] = {1,2,3,4,5,6,7};
constexpr int TestArr2[] = {1,2,2,2,2,3};
constexpr struct {
 std::span<int> arr;
 int findme;
 } data[] = {
 { TestArr1, 3 },
 ⋮
};

You can also look at testing frameworks, such as Catch2.

"C++ doesn't have "prototypes" what makes you say this? I see lots of places that use the word "prototype" with C++.
@northerner not in the ISO standard. It's a C term, introduced to distinguish legacy function declarations from the new style that tells the compiler what to expect. In C++, it only ever had declarations that include the parameters. Someone using that word with respect to C++ means either "function declaration" or "signature" or something else not clearly communicated.
@northerner and of course it's never used in the original Strustrup C++ book, the Annotated Reference, and other early books, as they predate ANSI C.

score 1 · Answer 2 · 2021-11-02 09:41:29Z

size_t is in the std namespace, so should be written std::size_t.

We don't need to include <cmath>.

main() should return non-zero if any of the tests failed.

Length+size is one way to pass a collection (and yes, std::size_t is the appropriate type for the size here); a more flexible (but more advanced) way is to write a general template that accepts a C++ Range object.

If we're returning an index, that also should be a std::size_t, as int might not be sufficient. However, that leaves a problem - we can no longer use -1 to indicate that the value is before the 0th element of the array. Our options for that case include:

return (std::size_t)-1
add 1 to the result (so we return the position after the found position, and caller must subtract 1 to use it)
return the index of the first element greater or equal to target (caller can check if it's equal or not) - this is the convention used by std::lower_bound()
throw an exception (this is expensive, so not a good choice for a "normal" result)
return a std::optional.

I recommend the third option, for the reason I've highlighted.

 const auto mid = start + ((end - start) / 2);

That's good - you've avoided a common mistake there. (If we were to write (start + end) / 2, we would be at risk of integer overflow).

As you observed, the test cases are very repetitive. We can extract the common parts to a separate function. My function looks something like this (most of the complexity is in giving good diagnostic output):

#include <vector>
static int test_binary_search(const std::vector<int>& v,
 int target,
 std::size_t min_result,
 std::size_t max_result = 0)
{
 if (max_result == 0) {
 // fill in optional argument
 max_result = min_result;
 }
 auto const result = binarySearch(v.data(), v.size(), target);
 if (min_result <= result && result <= max_result) {
 // test passed
 return 0;
 }
 // We failed - emit some diagnostics
 std::cerr << "Test failed: search for " << target
 << " in {";
 auto *sep = "";
 for (auto i: v) {
 std::cerr << sep << i;
 sep = ",";
 }
 std::cerr << "} returned " << result
 << "; expected " << min_result;
 if (min_result != max_result) {
 std::cerr << "-" << max_result;
 }
 std::cerr << '\n';
 return 1;
}

Then it's very easy to use:

int selfTest()
{
 return test_binary_search({1,2,3,4,5,6,7}, 4, 3)
 + test_binary_search({1,2,3,4,5,6,7}, 1, 0)
 + test_binary_search({1,2,3,4,5,6,7}, 7, 6)
 + test_binary_search({1,2,2,2,2,3}, 2, 1, 4)
 + test_binary_search({-10, -2, 5, 6, 10}, -2, 1);
}

A couple of things I fixed in passing: error messages should go to std::cerr rather than std::cout, and prefer plain newline (\n) rather than std::endl (which also flushes the output stream).

Having the test function print the inputs and expected result range helps avoid inconsistencies like this one:

result = binarySearch(test1, 7, 4);
if(test1[result] != 4)
{
 std::cout << "Test A failed. Expected 2, got " << result << std::endl;
 failed++;
}

(In fact, almost all the tests have wrong "Expected" value in the string).

Regarding the "that leaves a problem": If the value isn't found, would throwing an exception be a good solution?
This made me wonder, any time an array is being iterated should the type be std::size_t or unsigned int?
I'd always recommend std::size_t (unless there's a good reason we can guarantee that the result will always be no more than UINT_MAX - which could be as small as 65535).

Stack Exchange Network

Binary search with self-test

2 Answers 2

The Good

test code structure

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Binary search with self-test

2 Answers 2

The Good

test code structure

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions