5
\$\begingroup\$

I've written some code to "binary search" a list, and return the first occurrence of the target:

def bsearch(a, left, right, target, first_or_last = 'first'):
 """
 Use binary search to find the first, last, or random index
 of target.
 >>> a = [1,1,1,1,1,1]
 >>> bsearch(a, 0, len(a)-1, 1, 'first')
 0
 """
 if left > right:
 return -1
 mid = (left + right)//2
 val = a[mid]
 if val == target:
 first_occur = mid
 if first_or_last == 'first'
 later_occur = bsearch(a, left, mid - 1, target) # check if there's an earlier occurrence in the left sub-list
 if later_occur != -1:
 return later_occur
 else:
 return first_occur
 elif first_or_last == 'last':
 later_occur = bsearch(a, mid + 1, right, target) 
 if later_occur != -1:
 return later_occur
 else:
 return first_occur
 else:
 return first_occur
 if target < val:
 return bsearch(a, left, mid - 1, target)
 if target > val:
 return bsearch(a, mid + 1, right, target)

Is there a more elegant/cleaner way to give this function the ability to find the first, last, or random occurrence of the target, without use of a string to represent the cases (as I've done)? Suggestions would be great.

asked Aug 13, 2014 at 0:48
\$\endgroup\$
4
  • \$\begingroup\$ Is there any reason why this can't be three functions: bsearch_first(), bsearch_last() and bsearch_random()? \$\endgroup\$ Commented Aug 13, 2014 at 4:38
  • \$\begingroup\$ @mleyfman: I'd like to avoid code duplication if possible. The only difference in these three functions is when each checks for equality and it would be great if we could generalize the search to allow for custom termination. \$\endgroup\$ Commented Aug 13, 2014 at 5:00
  • \$\begingroup\$ Last question: what do you mean by random? Do you mean a regular search? \$\endgroup\$ Commented Aug 13, 2014 at 5:20
  • \$\begingroup\$ @mleyfman: yes -- as soon as we find target in the list, we return. \$\endgroup\$ Commented Aug 13, 2014 at 11:23

2 Answers 2

4
\$\begingroup\$

You want to have the ability to choose between finding the first, last and a random occurrence of an element in a list. Since you have three options, you cannot use true/false, and numbers aren't very intuitive (and they usually end up magic numbers). The remaining options are strings, functions or some other data structure. Since you don't like strings, and other data structures wouldn't make much sense (why use a complex object when a simple one will suffice?), then let's stick to functions.

But wait, you don't want any code duplication. That's perfectly ok. We note that all 3 of these options involve variations of calling the basic binary search.

Let's create a basic binary search that outputs a few extras:

def _binary_search(array, element, low, high):
 """Binary search that returns (index, low, high) of the final search step"""
 while low <= high:
 mid = (low + high) // 2
 current_element = array[mid]
 if current_element == element:
 return (mid, low, high)
 elif current_element < element:
 low = mid + 1
 elif current_element > element:
 high = mid - 1
 return (-1, 0, 0)

Then, let's create the new functions:

def lower_bound(array, element):
 """Returns the index of the first occurrence of element in array"""
 index = -1
 first_index, low, high = _binary_search(array, element, 0, len(array)-1)
 index = first_index
 while first_index != -1:
 index = first_index
 first_index, low, high = _binary_search(array, element, low, first_index-1)
 return index
def upper_bound(array, element):
 """Returns the index of the last occurence of element in array"""
 index = -1
 first_index, low, high = _binary_search(array, element, 0, len(array)-1)
 index = first_index
 while first_index != -1:
 index = first_index
 first_index, low, high = _binary_search(array, element, first_index+1, high)
 return index
def binary_search(array, element):
 """Basic binary search"""
 lower_bound = 0
 upper_bound = len(array) - 1
 return _binary_search(array, element, lower_bound, upper_bound)[0]

This has the advantage of being readable, not susceptible to spelling errors ("First" vs "first" vs "fist") and not having much in the way of duplicated code.


As a fun fact: this entire problem can be easily solved by making wrappers around bisect_left and bisect_right from the bisect module like this very convenient SortedCollection Python recipe has done here.

Update:

  1. Renamed functions to comply with standards as per comments
  2. Changed algorithms to not create new arrays and instead do an in-place binary search
  3. Changed algorithms to produce the correct results
  4. Added reference to an easy to use Python recipe that implements this, and other features.
answered Aug 13, 2014 at 5:39
\$\endgroup\$
4
  • \$\begingroup\$ Even though it is python, I'd recommend to stick to lower_bound and upper_bound, as they are blessed by mathematicians and used in STL. \$\endgroup\$ Commented Aug 13, 2014 at 8:06
  • \$\begingroup\$ @mleyfman: I like this a lot! I was hoping there would be some solution that involved functions and I think this is good! \$\endgroup\$ Commented Aug 13, 2014 at 11:44
  • \$\begingroup\$ @mleyfman: one thing that is worth noting is that we're creating a new array each time! this means that we'll be getting indices for the new array, and not the indices of the original array we wanted to search! \$\endgroup\$ Commented Aug 13, 2014 at 22:17
  • \$\begingroup\$ Ah, forgot about that. You will need to add/subtract from the original index (subtract for lower, add for upper). I'll post an updated solution this evening. \$\endgroup\$ Commented Aug 13, 2014 at 22:19
3
\$\begingroup\$

It is quite good to know that recursion is slow in Python so if you can easily write something in a loopy way, it might be worth doing so.

Also, passing a string parameter to tell which behavior is to be performed is a bit awkward. Apparently Python has enums but I didn't manage to make it work. Thus, on my side, I used an integer parameter whose sign is used to know which value is wanted. It is still a pretty bad solution but I haven't found any better.

Except for that, I haven't much to say about your code : variable names look good, code is properly documented, etc. You might be interested in my quickly written solution :

import math
# Result :
# <0 : first
# =0 : any
# >0 : last
def binary_search(l, x, result=0):
 low, high = 0, len(l) - 1
 while low <= high:
 size = high - low
 mid_float = (low+high) / 2.
 mid = int(math.ceil(mid_float) if result > 0 else math.floor(mid_float))
 found = l[mid]
 if x < found:
 high = mid - 1
 elif x > found:
 low = mid + 1
 else:
 assert x == found
 if result==0 or low==high:
 return mid
 elif result < 0:
 high = mid
 else:
 assert result > 0
 low = mid
 assert size > high - low
 return -1
def main():
 """Main function"""
 print("Hello, world!")
 for behavior in [-1, 0, 1]:
 assert binary_search([], 0, behavior) == -1
 r = range(10)
 for i in r:
 assert binary_search(r, i, behavior) == i
 assert binary_search(r, -1, behavior) == -1
 assert binary_search(r, 10, behavior) == -1
 l = [0] * 10
 assert binary_search(l, 0, 0) != -1 
 assert binary_search(l, 0, -1) == 0
 assert binary_search(l, 0, 1) == 9
 l = [0] * 10 + [1] * 10 + [2] * 10
 assert binary_search(l, 1, 0) != -1 
 assert binary_search(l, 1, -1) == 10
 assert binary_search(l, 1, 1) == 19

(Assertions make the code longer that it should be but safer is better).

answered Aug 13, 2014 at 10:47
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.