Locating a sequence in a sorted array

Question 1

I read the following question: Searching an element in a sorted array and I thought that I could give it a try in Python.

Given a sorted list of integers and an integer. Return the (index) bounds of the sequence of this element in the list.

Example:

l = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4, 6, 7, 8, 9, 9, 9, 9]
0 not found in list
1:(0, 0)
2:(1, 2)
3:(3, 5)
4:(6, 11)
5 not found in list
6:(12, 12)
7:(13, 13)
8:(14, 14)
9:(15, 18)

Here is my program (using Python 3). It uses a dichotomic search and returns the bounds to help the search of the start and end of the sequence.

def main():
 l = [1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 4, 6, 7, 8, 9, 9, 9, 9]
 print(l)
 for i in range(10):
 try:
 print(find_sequence(i, l))
 except Exception as e:
 print(str(e))
def find_sequence(x, l):
 """Return a tuple (begin, end) containing the index bounds of the sequence of x"""
 left, found, right = dichotomic_search(x, l)
 begin = outside_bound(x, l, left, found)
 end = outside_bound(x, l, right, found)
 return begin, end
def outside_bound(x, l, outside, inside):
 """Return the outside bound of the sequence"""
 if l[outside] == x:
 return outside
 middle = -1
 previous_middle = -2
 while middle != previous_middle:
 previous_middle = middle
 middle = (outside + inside) // 2
 if l[middle] == x:
 inside = middle
 else:
 outside = middle
 return inside
def dichotomic_search(x, l):
 """Return a tuple of indexes (left, found, right)
 left: leftmost index where x might be found
 found: index where x is
 right: rightmost index where x might be found
 """
 left = 0
 right = len(l) - 1
 if l[left] > x or l[right] < x:
 raise Exception(str(x) + ' not found in list')
 if l[left] == x:
 return left, left, right
 if l[right] == x:
 return left+1, right, right # we know that l[left]!=x
 while left < right:
 middle = (left + right) // 2
 if l[middle] == x:
 return left, middle, right
 elif l[middle] < x:
 left = middle + 1 # to prevent fixed point
 elif l[middle] > x:
 right = middle # impossible to do -1 because of the integer division
 raise Exception(str(x) + ' not found in list')
if __name__ == "__main__":
 main()

I'm not so fond of the middle != previous_middle, but I didn't find a more elegant way (yet).

Question 2

Do you know about bisect (i.e. did you implement that search yourself on purpose)?

Question 3

I implemented it on purpose (and I think using bisect the search for the second extremity of the sequence can't be optimized - whereas I'm reusing the bounds of my dichotomic search)

Question 4

Why not just bisect_left(l, x), bisect_right(l, x) - 1? The worry about reusing the bounds seems misplaced to me. In the average case you're searching an array that's half the size—but it's a binary search, so it only saves you one iteration. Whereas the bisect module has a fast C implementation.

Question 5

"it only saves you one iteration" only if you search the element. If you search the bounds, it might turn out to be worse. For instance, search for 1: {0, 1, 2, 2, 2, 2, 2, 2, 2}.

Question 6

I am afraid you're approach is too complicated. Just create a function that finds the first occurrence of an element using a binary search, an another function that finds the last one. And then you can write:

def find_indexes(xs, x)
 start = find_first_index(xs, x)
 return ((start, find_last_index(xs, x, start=start)) if start else None)

For a simple implementation of find_first_index and find_last_index, check bisect.bisect_left and bisect.bisect_right source code here.

Question 7

My approach might be complicated, but it aims at being efficient. When you call find_last_index, The first part of the work might have already been done by the find_first_index (the part "finding the index"). What I could do though, is "improve" bisect.bisect_left to remember the first time the element was found (to use it as a bound for find_last_index)

Question 8

In find_last_index, one could use first_index as a lower bound for instance!

Question 9

Yeah, I know, but in a O(log n) algorithm like a binary-search it does not matter much. Anyway, updated, the bisect functions have such argument.

Question 10

(you could use the same name as bisect: lo)

Question 11

Following @tokland suggestion, here is what I came with: The find_left_bound returns some bounds that can be used by find_right_bound to reduce the search area.

def main():
 l = [1,2,2,3,3,3,4,4,4,4,4,4,6,7,8,9,9,9,9]
 print(l)
 for i in range(10):
 try:
 print(str(i) + ': ' + str(find_sequence(l, i)))
 except Exception as e:
 print(str(e))
def find_sequence(l, x):
 """Return a tuple (begin, end) containing the index bounds of the sequence of x"""
 begin, (left, right) = find_left_bound(l, x)
 if l[begin] != x:
 raise Exception(str(x) + ' not found in list')
 end = find_right_bound(l, x, left, right)
 return begin, end
def find_left_bound(l, x):
 left = 0
 right = len(l)
 lbound = left # bounds for the 'right_bound' search
 rbound = right
 while left < right:
 middle = (left + right) // 2
 if l[middle] < x:
 left = middle + 1
 else:
 right = middle
 # If it's relevant, improve the bounds for the right search
 if l[middle] > x:
 rbound = middle
 elif middle > lbound:
 lbound = middle
 return left, (lbound, rbound)
def find_right_bound(l, x, left, right):
 while left < right:
 middle = (left + right) // 2
 if l[middle] > x:
 right = middle
 else:
 left = middle + 1
 return left - 1
if __name__ == "__main__":
 main()

I should rename find_left_bound (since it does actually a bit more), but I'm unable to find a suitable name...

I don't see other ways to improve the efficiency.

Question 12

I don't have performance notes, but you shouldn't be raising bare exceptions. There's more specific ones you can raise, so you should raise whatever's relevant to the problem. In this case? It's a ValueError, as the user has supplied values that are invalid.

raise ValueError(str(x) + ' not found in list')

This has the added bonus that you don't need to catch other, unrelated errors with your try except in main.

tokland 11.2k1 gold badge21 silver badges26 bronze badges · Answer 1 · 2015-10-24 15:09:00Z

1

\$\begingroup\$

I am afraid you're approach is too complicated. Just create a function that finds the first occurrence of an element using a binary search, an another function that finds the last one. And then you can write:

def find_indexes(xs, x)
 start = find_first_index(xs, x)
 return ((start, find_last_index(xs, x, start=start)) if start else None)

For a simple implementation of find_first_index and find_last_index, check bisect.bisect_left and bisect.bisect_right source code here.

Share

edited Oct 25, 2015 at 10:06

answered Oct 24, 2015 at 15:09

tokland's user avatar

tokland

11.2k1 gold badge21 silver badges26 bronze badges

\$\endgroup\$

4

\$\begingroup\$ My approach might be complicated, but it aims at being efficient. When you call find_last_index, The first part of the work might have already been done by the find_first_index (the part "finding the index"). What I could do though, is "improve" bisect.bisect_left to remember the first time the element was found (to use it as a bound for find_last_index) \$\endgroup\$

oliverpool
– oliverpool

2015年10月25日 08:12:58 +00:00
Commented Oct 25, 2015 at 8:12
\$\begingroup\$ In find_last_index, one could use first_index as a lower bound for instance! \$\endgroup\$

oliverpool
– oliverpool

2015年10月25日 08:14:50 +00:00
Commented Oct 25, 2015 at 8:14
\$\begingroup\$ Yeah, I know, but in a O(log n) algorithm like a binary-search it does not matter much. Anyway, updated, the bisect functions have such argument. \$\endgroup\$

tokland
– tokland

2015年10月25日 10:07:05 +00:00
Commented Oct 25, 2015 at 10:07
\$\begingroup\$ (you could use the same name as bisect: lo) \$\endgroup\$

oliverpool
– oliverpool

2015年10月25日 10:29:28 +00:00
Commented Oct 25, 2015 at 10:29

Add a comment |

oliverpool 1,92212 silver badges28 bronze badges · Answer 2 · 2015-10-25 13:20:09Z

Following @tokland suggestion, here is what I came with: The find_left_bound returns some bounds that can be used by find_right_bound to reduce the search area.

def main():
 l = [1,2,2,3,3,3,4,4,4,4,4,4,6,7,8,9,9,9,9]
 print(l)
 for i in range(10):
 try:
 print(str(i) + ': ' + str(find_sequence(l, i)))
 except Exception as e:
 print(str(e))
def find_sequence(l, x):
 """Return a tuple (begin, end) containing the index bounds of the sequence of x"""
 begin, (left, right) = find_left_bound(l, x)
 if l[begin] != x:
 raise Exception(str(x) + ' not found in list')
 end = find_right_bound(l, x, left, right)
 return begin, end
def find_left_bound(l, x):
 left = 0
 right = len(l)
 lbound = left # bounds for the 'right_bound' search
 rbound = right
 while left < right:
 middle = (left + right) // 2
 if l[middle] < x:
 left = middle + 1
 else:
 right = middle
 # If it's relevant, improve the bounds for the right search
 if l[middle] > x:
 rbound = middle
 elif middle > lbound:
 lbound = middle
 return left, (lbound, rbound)
def find_right_bound(l, x, left, right):
 while left < right:
 middle = (left + right) // 2
 if l[middle] > x:
 right = middle
 else:
 left = middle + 1
 return left - 1
if __name__ == "__main__":
 main()

I should rename find_left_bound (since it does actually a bit more), but I'm unable to find a suitable name...

I don't see other ways to improve the efficiency.

SuperBiasedMan 13.5k5 gold badges37 silver badges62 bronze badges · Answer 3 · 2015-10-28 15:44:10Z

I don't have performance notes, but you shouldn't be raising bare exceptions. There's more specific ones you can raise, so you should raise whatever's relevant to the problem. In this case? It's a ValueError, as the user has supplied values that are invalid.

raise ValueError(str(x) + ' not found in list')

This has the added bonus that you don't need to catch other, unrelated errors with your try except in main.

Stack Exchange Network

Locating a sequence in a sorted array

3 Answers 3

You must log in to answer this question.

Linked

Hot Network Questions

Locating a sequence in a sorted array

3 Answers 3

You must log in to answer this question.

Linked

Related

Hot Network Questions