Is the following implementation of recursive function correct? How about the time complexity of the code; is it good in terms of performance?
def binary_search(arr, num):
# find the middle index
# compare the middle index to the number
# if the number is greater than the middle index,
# search the right half of the list
# if the number is less than the middle index,
# search the left half of the list
# if the number is equal to the middle index,
# return the index
# if the number is not in the list,
# return -1
start = 0
end = len(arr) - 1
lookup = int(len(arr) / 2)
def search(arr, num):
nonlocal start, end, lookup
# base case
if len(arr) == 0:
return -1
if end == lookup and arr[lookup] != num:
lookup = -1
return
if num != arr[lookup] and num > arr[lookup]:
start = lookup + 1
lookup = int((start + end) / 2)
search(arr, num)
elif num != arr[lookup] and num < arr[lookup]:
end = lookup
lookup = int((start + end) / 2)
search(arr, num)
search(arr, num)
return lookup
print(binary_search([x for x in range(1, 10000)], 8))
# prints the index of the num if found, 7
-
\$\begingroup\$ When you ask about correctness and performance, you should really show us the testing you have done so far. Consider including the unit tests, and show us how you conducted your performance tests. Thanks. \$\endgroup\$Toby Speight– Toby Speight2021年12月19日 08:25:17 +00:00Commented Dec 19, 2021 at 8:25
2 Answers 2
A function's calls must be consistent with its implementation. The search()
helper function is organized to return -1 under some conditions. However,
nowhere do you capture the return value. That does not make sense. Functions
are either called for their side effects (e.g., modifying the value of the
non-local lookup
) or they are called for their return value (in which case,
you'd better capture it).
Base cases should not check unchanging information. When should we bail out
of search()
? Not when the list is empty: the list never changes size, so
emptiness should be checked before search()
is ever called. Bailing out of a
recursion should be driven by something that is changed as the function is
being called -- namely, when start
and end
cross each other, or when we
actually find the target number.
Global variables are never the answer. You're using functions, which is
good. However, you are undermining their power by resorting to nonlocal
. In
this case, you are using this mechanism to create the effect of a global
variable within the scope of the binary_search()
function. In well over a
decade of Python programming, I can remember only one use of global
or
nonlocal
, and that was for unusual circumstances where I was operating under
the constraints of a legacy code base. They just aren't needed under regular
circumstances, and they have a wide range of negative effects on software.
What's the solution? Pass start
and end
as arguments to search()
, and
compute lookup
only inside in function, never outside. The benefit of the
latter is that you need to compute its value in only one spot, not multiple. If
you're ever working on a recursive implementation and find yourself computing the
same values both outside and inside the recursive function, that's often a sign
that you have not organized the recursive logic well.
Use convenience variables to aid readability. The search() function uses
arr[lookup]
several times. A simple convenience variable can simplify
and lighten up the code, enhancing readability.
Select descriptive and thematic variable names. Although num
is good in
telling us that we're dealing with a number, it's still pretty vague. Much
better is a term that conveys the concept of something that is being searched
for: target
is one option. Similarly, lookup
is not a terrible variable
name, but it's also vague in the sense that it doesn't clarify whether lookup
is an index or value from the list. A more helpful name would build on the
theme you've already established with start
and end
: options are
midpoint
, middle
, or even just mid
. Finally, since this is a general
purpose function, it's not easy to pick a good name for the list of values. In
that context, arr
is fine. But a better name can be had by drawing on a
convention often seen in functional programming, where an arbitrary value can
be represented by a variable like x
and a collection of such values can be
represented by pluralizing the variable name to become xs
. Such names are
effective because convey (a) our lack of knowledge about the details of the
thing behind the variable, and (b) the connection between the collection (xs
)
and a value from it (x
).
Algorithmic code needs testing. Donald Knuth was a genius on the topic of
algorithms, and yet even he appreciated the difficulty of implementing
simple algorithms when he wrote the following: "although the basic idea of
binary search is comparatively straightforward, the details can be surprisingly
tricky". The way to avoid pitfalls is to organize your code from the beginning
to support testing. The best way to do that is to learn how to use one of the
Python testing frameworks, but if you're just learning or want a practical,
quick-and-dirty approach when working on programming exercises like this, you
can mimic the approach taken in main()
below: define a collection of test
cases (inputs and expected output); iterate over the collection; confirm that
the return value matches expectations.
def main():
tests = (
([], 99, -1),
([1, 2], 1, 0),
([1, 2, 4], 4, 2),
([1, 2, 3, 4], 4, 3),
([2, 4, 5, 99, 100], 5, 2),
(tuple(range(1, 1000)), 50, 49),
)
for xs, target, exp in tests:
got = binary_search(xs, target)
if got == exp:
print('ok', target)
else:
print('FAIL', target, exp)
def binary_search(xs, target):
def search(start, end):
mid = (end + start) // 2
x = xs[mid]
if x == target:
return mid
elif end <= start:
return -1
elif target > x:
return search(mid + 1, end)
else:
return search(start, mid - 1)
return search(0, len(xs) - 1) if xs else -1
if __name__ == '__main__':
main()
Why bother with recursion?. I would not use recursion for
a simple case like this: check for emptiness; initialize start
and
end
; and enter a while-true loop, modifying start and end inside
the loop rather than recursing. At a minimum, it's worth implementing
a non-recursive version to compare the two and see what you think.
-
\$\begingroup\$ I would convert your main into
doctests
otherwise fantastic answer! \$\endgroup\$N3buchadnezzar– N3buchadnezzar2021年12月20日 00:42:18 +00:00Commented Dec 20, 2021 at 0:42 -
\$\begingroup\$ @N3buchadnezzar I appreciate the comment -- thank you. I guess I'm too old school, but I really dislike writing test code within the constraints of documentation markup language. My IDE (a Python-pimped version of Vim) doesn't know what to make of it. But for the young kids, maybe it's a good idea! \$\endgroup\$FMc– FMc2021年12月20日 00:48:54 +00:00Commented Dec 20, 2021 at 0:48
-
1\$\begingroup\$ Using
neovim
here. Doctests work just fine with a proper linter ^^ \$\endgroup\$N3buchadnezzar– N3buchadnezzar2021年12月20日 02:03:23 +00:00Commented Dec 20, 2021 at 2:03 -
1\$\begingroup\$ @FMc, thank you for the time and well explained answer, appreciate it. I will do my best to improve. \$\endgroup\$Siawash Kasra– Siawash Kasra2021年12月20日 11:47:15 +00:00Commented Dec 20, 2021 at 11:47
Is the following implementation of recursive function correct?
No, for example for binary_search([1, 2], 1)
the result is -1
.
Explore related questions
See similar questions with these tags.