I've written some code to "binary search" a list, and return the first occurrence of the target:
def bsearch(a, left, right, target, first_or_last = 'first'):
"""
Use binary search to find the first, last, or random index
of target.
>>> a = [1,1,1,1,1,1]
>>> bsearch(a, 0, len(a)-1, 1, 'first')
0
"""
if left > right:
return -1
mid = (left + right)//2
val = a[mid]
if val == target:
first_occur = mid
if first_or_last == 'first'
later_occur = bsearch(a, left, mid - 1, target) # check if there's an earlier occurrence in the left sub-list
if later_occur != -1:
return later_occur
else:
return first_occur
elif first_or_last == 'last':
later_occur = bsearch(a, mid + 1, right, target)
if later_occur != -1:
return later_occur
else:
return first_occur
else:
return first_occur
if target < val:
return bsearch(a, left, mid - 1, target)
if target > val:
return bsearch(a, mid + 1, right, target)
Is there a more elegant/cleaner way to give this function the ability to find the first, last, or random occurrence of the target, without use of a string to represent the cases (as I've done)? Suggestions would be great.
2 Answers 2
You want to have the ability to choose between finding the first, last and a random occurrence of an element in a list. Since you have three options, you cannot use true/false, and numbers aren't very intuitive (and they usually end up magic numbers). The remaining options are strings, functions or some other data structure. Since you don't like strings, and other data structures wouldn't make much sense (why use a complex object when a simple one will suffice?), then let's stick to functions.
But wait, you don't want any code duplication. That's perfectly ok. We note that all 3 of these options involve variations of calling the basic binary search.
Let's create a basic binary search that outputs a few extras:
def _binary_search(array, element, low, high):
"""Binary search that returns (index, low, high) of the final search step"""
while low <= high:
mid = (low + high) // 2
current_element = array[mid]
if current_element == element:
return (mid, low, high)
elif current_element < element:
low = mid + 1
elif current_element > element:
high = mid - 1
return (-1, 0, 0)
Then, let's create the new functions:
def lower_bound(array, element):
"""Returns the index of the first occurrence of element in array"""
index = -1
first_index, low, high = _binary_search(array, element, 0, len(array)-1)
index = first_index
while first_index != -1:
index = first_index
first_index, low, high = _binary_search(array, element, low, first_index-1)
return index
def upper_bound(array, element):
"""Returns the index of the last occurence of element in array"""
index = -1
first_index, low, high = _binary_search(array, element, 0, len(array)-1)
index = first_index
while first_index != -1:
index = first_index
first_index, low, high = _binary_search(array, element, first_index+1, high)
return index
def binary_search(array, element):
"""Basic binary search"""
lower_bound = 0
upper_bound = len(array) - 1
return _binary_search(array, element, lower_bound, upper_bound)[0]
This has the advantage of being readable, not susceptible to spelling errors ("First" vs "first" vs "fist") and not having much in the way of duplicated code.
As a fun fact: this entire problem can be easily solved by making wrappers around bisect_left
and bisect_right
from the bisect module like this very convenient SortedCollection
Python recipe has done here.
Update:
- Renamed functions to comply with standards as per comments
- Changed algorithms to not create new arrays and instead do an in-place binary search
- Changed algorithms to produce the correct results
- Added reference to an easy to use Python recipe that implements this, and other features.
-
\$\begingroup\$ Even though it is python, I'd recommend to stick to
lower_bound
andupper_bound
, as they are blessed by mathematicians and used in STL. \$\endgroup\$vnp– vnp2014年08月13日 08:06:52 +00:00Commented Aug 13, 2014 at 8:06 -
\$\begingroup\$ @mleyfman: I like this a lot! I was hoping there would be some solution that involved functions and I think this is good! \$\endgroup\$rookie– rookie2014年08月13日 11:44:09 +00:00Commented Aug 13, 2014 at 11:44
-
\$\begingroup\$ @mleyfman: one thing that is worth noting is that we're creating a new array each time! this means that we'll be getting indices for the new array, and not the indices of the original array we wanted to search! \$\endgroup\$rookie– rookie2014年08月13日 22:17:23 +00:00Commented Aug 13, 2014 at 22:17
-
\$\begingroup\$ Ah, forgot about that. You will need to add/subtract from the original index (subtract for lower, add for upper). I'll post an updated solution this evening. \$\endgroup\$mleyfman– mleyfman2014年08月13日 22:19:10 +00:00Commented Aug 13, 2014 at 22:19
It is quite good to know that recursion is slow in Python so if you can easily write something in a loopy way, it might be worth doing so.
Also, passing a string parameter to tell which behavior is to be performed is a bit awkward. Apparently Python has enums but I didn't manage to make it work. Thus, on my side, I used an integer parameter whose sign is used to know which value is wanted. It is still a pretty bad solution but I haven't found any better.
Except for that, I haven't much to say about your code : variable names look good, code is properly documented, etc. You might be interested in my quickly written solution :
import math
# Result :
# <0 : first
# =0 : any
# >0 : last
def binary_search(l, x, result=0):
low, high = 0, len(l) - 1
while low <= high:
size = high - low
mid_float = (low+high) / 2.
mid = int(math.ceil(mid_float) if result > 0 else math.floor(mid_float))
found = l[mid]
if x < found:
high = mid - 1
elif x > found:
low = mid + 1
else:
assert x == found
if result==0 or low==high:
return mid
elif result < 0:
high = mid
else:
assert result > 0
low = mid
assert size > high - low
return -1
def main():
"""Main function"""
print("Hello, world!")
for behavior in [-1, 0, 1]:
assert binary_search([], 0, behavior) == -1
r = range(10)
for i in r:
assert binary_search(r, i, behavior) == i
assert binary_search(r, -1, behavior) == -1
assert binary_search(r, 10, behavior) == -1
l = [0] * 10
assert binary_search(l, 0, 0) != -1
assert binary_search(l, 0, -1) == 0
assert binary_search(l, 0, 1) == 9
l = [0] * 10 + [1] * 10 + [2] * 10
assert binary_search(l, 1, 0) != -1
assert binary_search(l, 1, -1) == 10
assert binary_search(l, 1, 1) == 19
(Assertions make the code longer that it should be but safer is better).
bsearch_first()
,bsearch_last()
andbsearch_random()
? \$\endgroup\$target
in the list, we return. \$\endgroup\$