Binary search: first/last/random occurrence

Question 1

I've written some code to "binary search" a list, and return the first occurrence of the target:

def bsearch(a, left, right, target, first_or_last = 'first'):
 """
 Use binary search to find the first, last, or random index
 of target.
 >>> a = [1,1,1,1,1,1]
 >>> bsearch(a, 0, len(a)-1, 1, 'first')
 0
 """
 if left > right:
 return -1
 mid = (left + right)//2
 val = a[mid]
 if val == target:
 first_occur = mid
 if first_or_last == 'first'
 later_occur = bsearch(a, left, mid - 1, target) # check if there's an earlier occurrence in the left sub-list
 if later_occur != -1:
 return later_occur
 else:
 return first_occur
 elif first_or_last == 'last':
 later_occur = bsearch(a, mid + 1, right, target) 
 if later_occur != -1:
 return later_occur
 else:
 return first_occur
 else:
 return first_occur
 if target < val:
 return bsearch(a, left, mid - 1, target)
 if target > val:
 return bsearch(a, mid + 1, right, target)

Is there a more elegant/cleaner way to give this function the ability to find the first, last, or random occurrence of the target, without use of a string to represent the cases (as I've done)? Suggestions would be great.

Question 2

Is there any reason why this can't be three functions: bsearch_first(), bsearch_last() and bsearch_random()?

Question 3

@mleyfman: I'd like to avoid code duplication if possible. The only difference in these three functions is when each checks for equality and it would be great if we could generalize the search to allow for custom termination.

Question 4

Last question: what do you mean by random? Do you mean a regular search?

Question 5

@mleyfman: yes -- as soon as we find target in the list, we return.

Question 6

You want to have the ability to choose between finding the first, last and a random occurrence of an element in a list. Since you have three options, you cannot use true/false, and numbers aren't very intuitive (and they usually end up magic numbers). The remaining options are strings, functions or some other data structure. Since you don't like strings, and other data structures wouldn't make much sense (why use a complex object when a simple one will suffice?), then let's stick to functions.

But wait, you don't want any code duplication. That's perfectly ok. We note that all 3 of these options involve variations of calling the basic binary search.

Let's create a basic binary search that outputs a few extras:

def _binary_search(array, element, low, high):
 """Binary search that returns (index, low, high) of the final search step"""
 while low <= high:
 mid = (low + high) // 2
 current_element = array[mid]
 if current_element == element:
 return (mid, low, high)
 elif current_element < element:
 low = mid + 1
 elif current_element > element:
 high = mid - 1
 return (-1, 0, 0)

Then, let's create the new functions:

def lower_bound(array, element):
 """Returns the index of the first occurrence of element in array"""
 index = -1
 first_index, low, high = _binary_search(array, element, 0, len(array)-1)
 index = first_index
 while first_index != -1:
 index = first_index
 first_index, low, high = _binary_search(array, element, low, first_index-1)
 return index
def upper_bound(array, element):
 """Returns the index of the last occurence of element in array"""
 index = -1
 first_index, low, high = _binary_search(array, element, 0, len(array)-1)
 index = first_index
 while first_index != -1:
 index = first_index
 first_index, low, high = _binary_search(array, element, first_index+1, high)
 return index
def binary_search(array, element):
 """Basic binary search"""
 lower_bound = 0
 upper_bound = len(array) - 1
 return _binary_search(array, element, lower_bound, upper_bound)[0]

This has the advantage of being readable, not susceptible to spelling errors ("First" vs "first" vs "fist") and not having much in the way of duplicated code.

As a fun fact: this entire problem can be easily solved by making wrappers around bisect_left and bisect_right from the bisect module like this very convenient SortedCollection Python recipe has done here.

Update:

Renamed functions to comply with standards as per comments
Changed algorithms to not create new arrays and instead do an in-place binary search
Changed algorithms to produce the correct results
Added reference to an easy to use Python recipe that implements this, and other features.

Question 7

Even though it is python, I'd recommend to stick to lower_bound and upper_bound, as they are blessed by mathematicians and used in STL.

Question 8

@mleyfman: I like this a lot! I was hoping there would be some solution that involved functions and I think this is good!

Question 9

@mleyfman: one thing that is worth noting is that we're creating a new array each time! this means that we'll be getting indices for the new array, and not the indices of the original array we wanted to search!

Question 10

Ah, forgot about that. You will need to add/subtract from the original index (subtract for lower, add for upper). I'll post an updated solution this evening.

Question 11

It is quite good to know that recursion is slow in Python so if you can easily write something in a loopy way, it might be worth doing so.

Also, passing a string parameter to tell which behavior is to be performed is a bit awkward. Apparently Python has enums but I didn't manage to make it work. Thus, on my side, I used an integer parameter whose sign is used to know which value is wanted. It is still a pretty bad solution but I haven't found any better.

Except for that, I haven't much to say about your code : variable names look good, code is properly documented, etc. You might be interested in my quickly written solution :

import math
# Result :
# <0 : first
# =0 : any
# >0 : last
def binary_search(l, x, result=0):
 low, high = 0, len(l) - 1
 while low <= high:
 size = high - low
 mid_float = (low+high) / 2.
 mid = int(math.ceil(mid_float) if result > 0 else math.floor(mid_float))
 found = l[mid]
 if x < found:
 high = mid - 1
 elif x > found:
 low = mid + 1
 else:
 assert x == found
 if result==0 or low==high:
 return mid
 elif result < 0:
 high = mid
 else:
 assert result > 0
 low = mid
 assert size > high - low
 return -1
def main():
 """Main function"""
 print("Hello, world!")
 for behavior in [-1, 0, 1]:
 assert binary_search([], 0, behavior) == -1
 r = range(10)
 for i in r:
 assert binary_search(r, i, behavior) == i
 assert binary_search(r, -1, behavior) == -1
 assert binary_search(r, 10, behavior) == -1
 l = [0] * 10
 assert binary_search(l, 0, 0) != -1 
 assert binary_search(l, 0, -1) == 0
 assert binary_search(l, 0, 1) == 9
 l = [0] * 10 + [1] * 10 + [2] * 10
 assert binary_search(l, 1, 0) != -1 
 assert binary_search(l, 1, -1) == 10
 assert binary_search(l, 1, 1) == 19

(Assertions make the code longer that it should be but safer is better).

mleyfman mleyfman 5,2601 gold badge25 silver badges48 bronze badges · Accepted Answer · 2014-08-13 05:39:00Z

You want to have the ability to choose between finding the first, last and a random occurrence of an element in a list. Since you have three options, you cannot use true/false, and numbers aren't very intuitive (and they usually end up magic numbers). The remaining options are strings, functions or some other data structure. Since you don't like strings, and other data structures wouldn't make much sense (why use a complex object when a simple one will suffice?), then let's stick to functions.

But wait, you don't want any code duplication. That's perfectly ok. We note that all 3 of these options involve variations of calling the basic binary search.

Let's create a basic binary search that outputs a few extras:

def _binary_search(array, element, low, high):
 """Binary search that returns (index, low, high) of the final search step"""
 while low <= high:
 mid = (low + high) // 2
 current_element = array[mid]
 if current_element == element:
 return (mid, low, high)
 elif current_element < element:
 low = mid + 1
 elif current_element > element:
 high = mid - 1
 return (-1, 0, 0)

Then, let's create the new functions:

def lower_bound(array, element):
 """Returns the index of the first occurrence of element in array"""
 index = -1
 first_index, low, high = _binary_search(array, element, 0, len(array)-1)
 index = first_index
 while first_index != -1:
 index = first_index
 first_index, low, high = _binary_search(array, element, low, first_index-1)
 return index
def upper_bound(array, element):
 """Returns the index of the last occurence of element in array"""
 index = -1
 first_index, low, high = _binary_search(array, element, 0, len(array)-1)
 index = first_index
 while first_index != -1:
 index = first_index
 first_index, low, high = _binary_search(array, element, first_index+1, high)
 return index
def binary_search(array, element):
 """Basic binary search"""
 lower_bound = 0
 upper_bound = len(array) - 1
 return _binary_search(array, element, lower_bound, upper_bound)[0]

This has the advantage of being readable, not susceptible to spelling errors ("First" vs "first" vs "fist") and not having much in the way of duplicated code.

As a fun fact: this entire problem can be easily solved by making wrappers around bisect_left and bisect_right from the bisect module like this very convenient SortedCollection Python recipe has done here.

Update:

Renamed functions to comply with standards as per comments
Changed algorithms to not create new arrays and instead do an in-place binary search
Changed algorithms to produce the correct results
Added reference to an easy to use Python recipe that implements this, and other features.

Even though it is python, I'd recommend to stick to lower_bound and upper_bound, as they are blessed by mathematicians and used in STL.
@mleyfman: I like this a lot! I was hoping there would be some solution that involved functions and I think this is good!
@mleyfman: one thing that is worth noting is that we're creating a new array each time! this means that we'll be getting indices for the new array, and not the indices of the original array we wanted to search!
Ah, forgot about that. You will need to add/subtract from the original index (subtract for lower, add for upper). I'll post an updated solution this evening.

Stack Exchange Network

Binary search: first/last/random occurrence

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Binary search: first/last/random occurrence

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions