I'm working on an implementation of binary search (in Python) that can operate on a sorted array that may have duplicate elements.
I've written a submission that has managed to hold up against my test cases so far, but I have the feeling that there may be a more elegant way to write my recursive solution.
If anyone has any critiques and advice on what I can do to improve the following solution, I would greatly appreciate it.
In response to @SylvainD, I've rewritten my code. I had some ideas on how to improve it in the past several hours, and I've included some new test cases based on the array @SylvainD provided in their comment.
It's passing every test case I have so far except for when I ask the function to search for all occurrences of '6' in the array provided by @SylvainD.
For some reason, it's stopping short of the last index.
def binary_search(keys, target, start = 0, end = None):
'''
Searches the array, 'keys', for an integer, 'target'. Each
call of the method also takes a 'start' and 'end' argument,
specifying the index to start and end the search on.
If the target is found to be in the keys array, the index of
its location is returned. If not, a -1 is returned.
'''
if end == None: # If end is set to a default argument of 'None', then set it to the index of the last element in the array, 'keys'.
end = len(keys)-1
if end < start: # If the last element is smaller than the first, then the array becomes valid because it is not sorted.
return -1
mp = start+(end-start)//2 # Calculate the midpoint
if target == keys[mp]: # Is the target at mp?
return mp
elif target < keys[mp]: # target is below the mp
return binary_search(keys, target, start = start, end = mp-1)
else:
return binary_search(keys, target, start = mp+1, end=end)
def left_finder(keys, target, start = 0, end = None):
'''Find the index of the left most target in a sorted array with duplicates'''
if end == None:
end = len(keys)-1
mp = binary_search(keys, target) # call binary search
if mp==0 or mp==-1: # mp is 1st element or DNE in keys
return mp
elif keys[mp-1] != target: # Left neighbor != target
return mp
else: # Keep searching moving left by shrinking end by one
return binary_search(keys, target, start=start, end=mp-1)
def right_finder(keys, target, start=0, end=None):
'''Find the index of the rightmost target in the sorted array with duplicates'''
if end == None:
end = len(keys)-1
mp = binary_search(keys, target)
if mp == len(keys)-1 or mp == -1: # mp is last element or DNE
return mp
elif keys[mp+1] != target: # Not last element, right neighbors != target
return mp
else: #keep searching pushing right ward
return binary_search(keys, target, start = mp+1, end=end)
def duplicate_binary_search(keys, target):
'''Uses left finder, right finder, and binary search to find all occurrences of a target in keys'''
all_occurrences = [] # container for situations with multiple occurrences of the target
left = left_finder(keys, target) # find a value in keys that qualifies as a possible left
right = right_finder(keys, target) # find a value in keys that qualifies as possible right
if left == right: # check if there is only 1 occurrence, or if target DNE in keys
return left
else: # append indices of all occurrences to list and return
for i in range(left, right+1):
all_occurrences.append(i)
return all_occurrences
In a nutshell, I'm piggybacking off of my binary search solution for sorted arrays with no duplicates.
I'm calling this binary search method in functions Left Finder and Right Finder, to try and find the left and right indices.
I then return the range of indices for which the duplicate elements are found.
Test Cases:
Test Four is currently failing.
I suspect that maybe just re-calling my original solution might be too crude, and I may need to re-imagine a Right Finder binary search from the ground up.
import unittest
class TestBinarySearch(unittest.TestCase):
def test_one(self):
keys = [1, 13, 42, 42, 42, 77, 78]
target = 42
results = duplicate_binary_search(keys, target)
self.assertListEqual(results, [2, 3, 4])
def test_two(self):
keys = [1, 2, 3, 4, 4, 4, 5, 5, 6, 6, 6]
target = 4
results = duplicate_binary_search(keys, target)
self.assertListEqual(results, [3, 4, 5])
def test_three(self):
keys = [1, 2, 3, 4, 4, 4, 5, 5, 6, 6, 6]
target = 5
results = duplicate_binary_search(keys, target)
self.assertListEqual(results, [6,7])
def test_four(self):
# Right most is last element
# This test currently fails to realize 10 as right most
keys = [1, 2, 3, 4, 4, 4, 5, 5, 6, 6, 6]
target = 6
results = duplicate_binary_search(keys, target)
self.assertListEqual(results, [8, 9, 10])
def test_five(self):
# Left most is first element
keys = [1, 1, 1, 4, 4, 4, 5, 5, 6, 6, 6]
target = 1
results = duplicate_binary_search(keys, target)
self.assertListEqual(results, [0, 1, 2])
Update
Debugging the original right_finder() by hand was proving difficult, and as @greybeard pointed out, simply calling binary search again with an "end=end+1" might not solve the issue with different, longer arrays, so I decided it would be easier to re-write right_finder() as its own self-contained variant of binary search that calls itself recursively until it finds the rightmost occurrence or determines that it isn't there.
This new version seems to be holding up so far.
def right_finder(keys, target, start = 0, end = None):
if end is None:
end = len(keys)-1
if end < start:
return -1
mp = start+(end-start)//2
if mp == len(keys)-1 and keys[mp] == target:
return mp
elif mp != len(keys)-1 and target < keys[mp+1] and keys[mp] == target:
return mp
elif target < keys[mp]:
return right_finder(keys, target, start = start, end = mp-1)
else:
return right_finder(keys, target, start = mp+1, end = end)
2 Answers 2
Style issue
There is a style guide for Python called PEP 8 which is definitly worth reading.
Among other things:
- Avoid trailing whitespace anywhere
- Comparisons to singletons like None should always be done with is or is not, never the equality operators
More tests
Before performing any change in the code, it would be nice to improve the test suite. In particular:
- testing
left_finder
,right_finder
(andbinary_search
) - testing cases where do not have duplicated results to be found: the target may also be found 0 times or found once
- testing edge cases: empty lists, list with one element, etc
class TestBinarySearch(unittest.TestCase):
def test_not_found(self):
for keys in ([], [1], [1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16]):
for target in (0, 4, 14, 17):
self.assertEqual(left_finder(keys, target), -1)
self.assertEqual(right_finder(keys, target), -1)
self.assertEqual(duplicate_binary_search(keys, target), -1)
def test_single_found(self):
for keys in ([1], [1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16]):
for pos, target in enumerate(keys):
self.assertEqual(left_finder(keys, target), pos)
self.assertEqual(right_finder(keys, target), pos)
self.assertEqual(duplicate_binary_search(keys, target), pos)
def test_one(self):
keys = [1, 13, 42, 42, 42, 77, 78]
target = 42
self.assertEqual(left_finder(keys, target), 2)
self.assertEqual(right_finder(keys, target), 4)
self.assertListEqual(duplicate_binary_search(keys, target), [2, 3, 4])
def test_two(self):
keys = [1, 2, 3, 4, 4, 4, 5, 5, 6, 6, 6]
target = 4
self.assertEqual(left_finder(keys, target), 3)
self.assertEqual(right_finder(keys, target), 5)
self.assertListEqual(duplicate_binary_search(keys, target), [3, 4, 5])
def test_three(self):
keys = [1, 2, 3, 4, 4, 4, 5, 5, 6, 6, 6]
target = 5
self.assertEqual(left_finder(keys, target), 6)
self.assertEqual(right_finder(keys, target), 7)
self.assertListEqual(duplicate_binary_search(keys, target), [6, 7])
def test_four(self):
# Right most is last element
# This test currently fails to realize 10 as right most
keys = [1, 2, 3, 4, 4, 4, 5, 5, 6, 6, 6]
target = 6
self.assertEqual(left_finder(keys, target), 8)
self.assertEqual(right_finder(keys, target), 10)
self.assertListEqual(duplicate_binary_search(keys, target), [8, 9, 10])
def test_five(self):
# Left most is first element
keys = [1, 1, 1, 4, 4, 4, 5, 5, 6, 6, 6]
target = 1
self.assertEqual(left_finder(keys, target), 0)
self.assertEqual(right_finder(keys, target), 2)
self.assertListEqual(duplicate_binary_search(keys, target), [0, 1, 2])
We could also take this chance to give your testing function proper names.
This gives us:
- some confidence in most of the code
- some hints to understand the issue found:
right_finder
does not return the expected value.
Better signature for duplicate_binary_search
duplicate_binary_search
is expected to return:
a list of indices when the search returns multiple results
a single position when the search returns a single value
-1 when the search returns no result
It would be more consistent to result a list in all cases, which could contain 0, 1 or more elements.
def duplicate_binary_search(keys, target):
"""Uses left finder, right finder, and binary search to find all occurrences of a target in keys"""
# Find a value in keys that qualifies as a possible left/right
left, right = left_finder(keys, target), right_finder(keys, target)
# Check properties of values returned
assert left <= right
assert (left == -1) == (right == -1)
# Return range
return [] if left == -1 else list(range(left, right+1))
and the corresponding test update:
def test_not_found(self):
for keys in ([], [1], [1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16]):
for target in (0, 4, 14, 17):
self.assertEqual(left_finder(keys, target), -1)
self.assertEqual(right_finder(keys, target), -1)
self.assertListEqual(duplicate_binary_search(keys, target), [])
def test_single_found(self):
for keys in ([1], [1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 15, 16]):
for pos, target in enumerate(keys):
self.assertEqual(left_finder(keys, target), pos)
self.assertEqual(right_finder(keys, target), pos)
self.assertListEqual(duplicate_binary_search(keys, target), [pos])
Hint to understand the bug issue and fix it
right_finder
looks for the right-most occurrence by:
- looking for an occurrence
- using that occurrence to reduce the range searched
- looking for an occurrence in that reduced range
At step 3, there is no guarantee that there the value returned is the right-most.
More tests can be put in place to show the issue in different context. For instance:
def test_duplicate_in_middle(self):
target = 4
for keys, expected in (
([1, 4, 4, 7], [1, 2]),
([1, 4, 4, 4, 4, 4, 4, 4, 7], [1, 2, 3, 4, 5, 6, 7]),
([1, 2, 3, 4, 4, 4, 5, 6, 7], [3, 4, 5]),
):
self.assertTrue(binary_search(keys, target) in expected)
self.assertEqual(left_finder(keys, target), expected[0])
self.assertEqual(right_finder(keys, target), expected[-1])
self.assertListEqual(duplicate_binary_search(keys, target), expected)
-
\$\begingroup\$ Thanks! I'm still struggling to solve the issue with 'right_finder()', but I have taken steps to try and incorporate your advice into my code by adding better descriptive names for my test cases, incorporating your improvements, and by trimming white space. I can see that I have yet to include a condition that can definitely find a right-most element, but I'm still struggling to figure out what the condition is. I've tried adding an 'if keys[mp+1] == target, then continue binary searching from start=mp+1' to my code, but it's had no effect. \$\endgroup\$Stanley Yu– Stanley Yu2022年05月02日 18:07:12 +00:00Commented May 2, 2022 at 18:07
-
\$\begingroup\$ I'm glad if this helps. If you want your updated code to be reviewed, the usage is to create a new question for this. \$\endgroup\$SylvainD– SylvainD2022年05月03日 08:43:54 +00:00Commented May 3, 2022 at 8:43
-
1\$\begingroup\$ It really helped a lot! I was also able to come up with a fix to 'left_finder()' and 'right_finder()', and they're holding up so far. \$\endgroup\$Stanley Yu– Stanley Yu2022年05月03日 14:19:56 +00:00Commented May 3, 2022 at 14:19
First, I want to say thank you to everyone who offered their critiques and advice.
Not only did I learn a lot about binary search, I learned a lot about how to write better Python code.
def binary_search(keys, target, start = 0, end = None):
end = end if end is not None else len(keys) - 1
if end < start:
return -1
mp = start + (end - start) // 2
if target == keys[mp]:
return mp
elif target < keys[mp]:
return binary_search(keys, target, start = start, end = mp - 1)
else:
return binary_search(keys, target, start = mp + 1, end = end)
def left_finder(keys, target, start = 0, end = None):
end = end if end is not None else len(keys) - 1
if end < start:
return -1
mp = start + (end - start) / /2
if ((mp == 0 or keys[mp - 1] < target) and keys[mp] == target):
return mp
elif target <= keys[mp]:
return left_finder(keys, target, start = start, end = mp - 1)
else:
return left_finder(keys, target, start = mp + 1, end = end)
def right_finder(keys, target, start = 0, end = None):
end = end if end is not None else len(keys) - 1
if end < start:
return -1
mp = start + (end-start) // 2
if ((mp == len(keys) - 1 or target < keys[mp + 1]) and keys[mp] == target):
return mp
elif target < keys[mp]:
return right_finder(keys, target, start = start, end = mp - 1)
else:
return right_finder(keys, target, start = mp + 1, end = end)
def duplicate_binary_search(keys, target):
left, right = left_finder(keys, target), right_finder(keys, target)
assert left <= right
assert (left == -1) == (right == -1)
return [] if left == -1 else list(range(left, right + 1))
-
1\$\begingroup\$ If you want use to review the updated code please post it as a follow up question with a link back to this question. \$\endgroup\$2022年05月03日 16:14:51 +00:00Commented May 3, 2022 at 16:14
-
3\$\begingroup\$ Welcome to Code Review! While OPs are encouraged to answer their own questions bear in mind that "Every answer must make at least one insightful observation about the code in the question. Answers that merely provide an alternate solution with no explanation or justification do not constitute valid Code Review answers and may be deleted."... \$\endgroup\$2022年05月03日 16:32:33 +00:00Commented May 3, 2022 at 16:32
-
\$\begingroup\$ If you explain what you've modified and, more importantly, why, this would be a review. \$\endgroup\$2022年05月03日 17:45:11 +00:00Commented May 3, 2022 at 17:45
Explore related questions
See similar questions with these tags.
'start' and 'end' argument specifying the index to start and end the search on
While it is conventional to have start inclusive, your searches seem to handle end that way, too - in contrast to Python practice. \$\endgroup\$binary_search(keys, target, start = mp+1, end=end+1)
inright_finder()
will fix your directional finders for longer runs of repeated keys (check what happens with many1
s). As you are reinventingbisect.bisect_left()
&bisect_right()
, have a peek at their (possible/conceptual) implementation. \$\endgroup\$