I read the following question: Searching an element in a sorted array and I thought that I could give it a try in Python.
Given a sorted list of integers and an integer. Return the (index) bounds of the sequence of this element in the list.
Example:
l = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4, 6, 7, 8, 9, 9, 9, 9]
0 not found in list
1:(0, 0)
2:(1, 2)
3:(3, 5)
4:(6, 11)
5 not found in list
6:(12, 12)
7:(13, 13)
8:(14, 14)
9:(15, 18)
Here is my program (using Python 3). It uses a dichotomic search and returns the bounds to help the search of the start and end of the sequence.
def main():
l = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4, 6, 7, 8, 9, 9, 9, 9]
print(l)
for i in range(10):
try:
print(find_sequence(i, l))
except Exception as e:
print(str(e))
def find_sequence(x, l):
"""Return a tuple (begin, end) containing the index bounds of the sequence of x"""
left, found, right = dichotomic_search(x, l)
begin = outside_bound(x, l, left, found)
end = outside_bound(x, l, right, found)
return begin, end
def outside_bound(x, l, outside, inside):
"""Return the outside bound of the sequence"""
if l[outside] == x:
return outside
middle = -1
previous_middle = -2
while middle != previous_middle:
previous_middle = middle
middle = (outside + inside) // 2
if l[middle] == x:
inside = middle
else:
outside = middle
return inside
def dichotomic_search(x, l):
"""Return a tuple of indexes (left, found, right)
left: leftmost index where x might be found
found: index where x is
right: rightmost index where x might be found
"""
left = 0
right = len(l) - 1
if l[left] > x or l[right] < x:
raise Exception(str(x) + ' not found in list')
if l[left] == x:
return left, left, right
if l[right] == x:
return left+1, right, right # we know that l[left]!=x
while left < right:
middle = (left + right) // 2
if l[middle] == x:
return left, middle, right
elif l[middle] < x:
left = middle + 1 # to prevent fixed point
elif l[middle] > x:
right = middle # impossible to do -1 because of the integer division
raise Exception(str(x) + ' not found in list')
if __name__ == "__main__":
main()
I'm not so fond of the middle != previous_middle, but I didn't find a more elegant way (yet).
3 Answers 3
I am afraid you're approach is too complicated. Just create a function that finds the first occurrence of an element using a binary search, an another function that finds the last one. And then you can write:
def find_indexes(xs, x)
start = find_first_index(xs, x)
return ((start, find_last_index(xs, x, start=start)) if start else None)
For a simple implementation of find_first_index and find_last_index, check bisect.bisect_left and bisect.bisect_right source code here.
-
\$\begingroup\$ My approach might be complicated, but it aims at being efficient. When you call
find_last_index, The first part of the work might have already been done by thefind_first_index(the part "finding the index"). What I could do though, is "improve"bisect.bisect_leftto remember the first time the element was found (to use it as a bound forfind_last_index) \$\endgroup\$oliverpool– oliverpool2015年10月25日 08:12:58 +00:00Commented Oct 25, 2015 at 8:12 -
\$\begingroup\$ In
find_last_index, one could usefirst_indexas a lower bound for instance! \$\endgroup\$oliverpool– oliverpool2015年10月25日 08:14:50 +00:00Commented Oct 25, 2015 at 8:14 -
\$\begingroup\$ Yeah, I know, but in a O(log n) algorithm like a binary-search it does not matter much. Anyway, updated, the bisect functions have such argument. \$\endgroup\$tokland– tokland2015年10月25日 10:07:05 +00:00Commented Oct 25, 2015 at 10:07
-
\$\begingroup\$ (you could use the same name as bisect:
lo) \$\endgroup\$oliverpool– oliverpool2015年10月25日 10:29:28 +00:00Commented Oct 25, 2015 at 10:29
Following @tokland suggestion, here is what I came with:
The find_left_bound returns some bounds that can be used by find_right_bound to reduce the search area.
def main():
l = [1,2,2,3,3,3,4,4,4,4,4,4,6,7,8,9,9,9,9]
print(l)
for i in range(10):
try:
print(str(i) + ': ' + str(find_sequence(l, i)))
except Exception as e:
print(str(e))
def find_sequence(l, x):
"""Return a tuple (begin, end) containing the index bounds of the sequence of x"""
begin, (left, right) = find_left_bound(l, x)
if l[begin] != x:
raise Exception(str(x) + ' not found in list')
end = find_right_bound(l, x, left, right)
return begin, end
def find_left_bound(l, x):
left = 0
right = len(l)
lbound = left # bounds for the 'right_bound' search
rbound = right
while left < right:
middle = (left + right) // 2
if l[middle] < x:
left = middle + 1
else:
right = middle
# If it's relevant, improve the bounds for the right search
if l[middle] > x:
rbound = middle
elif middle > lbound:
lbound = middle
return left, (lbound, rbound)
def find_right_bound(l, x, left, right):
while left < right:
middle = (left + right) // 2
if l[middle] > x:
right = middle
else:
left = middle + 1
return left - 1
if __name__ == "__main__":
main()
I should rename find_left_bound (since it does actually a bit more), but I'm unable to find a suitable name...
I don't see other ways to improve the efficiency.
I don't have performance notes, but you shouldn't be raising bare exceptions. There's more specific ones you can raise, so you should raise whatever's relevant to the problem. In this case? It's a ValueError, as the user has supplied values that are invalid.
raise ValueError(str(x) + ' not found in list')
This has the added bonus that you don't need to catch other, unrelated errors with your try except in main.
You must log in to answer this question.
Explore related questions
See similar questions with these tags.
bisect(i.e. did you implement that search yourself on purpose)? \$\endgroup\$bisect_left(l, x), bisect_right(l, x) - 1? The worry about reusing the bounds seems misplaced to me. In the average case you're searching an array that's half the size—but it's a binary search, so it only saves you one iteration. Whereas thebisectmodule has a fast C implementation. \$\endgroup\$1:{0, 1, 2, 2, 2, 2, 2, 2, 2}. \$\endgroup\$