I've created a function that uses binary search to insert an element into an array (similar to this question). I haven't benchmarked it, but based on my knowledge it should get \$O(1)\$ in the best case (i.e., appending an item to the end of the list) and \$O(n\log n)\$ in the average/worst case (I have no numbers to back this up - frankly I don't have much experience with benchmarking, so have mercy on me.)
Here's the binary search algorithm:
def binary_search(a, x):
mid = 0
min = 0
max = len(a)
# Deal with the edge cases
if x < a[min]:
return -1
if x > a[max-1]:
return max
# Now that we know that the value is in range,
# perform the actual search
while min < max:
mid = mid_point(min,max)
if x < a[mid]:
max = mid - 1
elif x > a[mid]:
min = mid + 1
else:
return mid
# Another edge case
return min if a[min] >= x else min + 1
This will return the index to insert the element into. The function used to perform the insertion is simple:
def binary_insert(array, value):
index = binary_search(array,value)
if index < 0: # Just append the value to the end of the list
array.insert(0,value)
else:
array.insert(index,value)
The function works on a pre-sorted list (a requirement of a binary search) and maintains the list in sorted order. For example inserting [0..10)
(aka [0,1,2,...,8,9]
) into [-1, 0, 1, 4, 5, 6, 7, 8, 9, 10]
yields [-1, 0, 0, 1, 1, 2, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10]
You may have noticed that the return statement (the last one) for binary_search
is very messy, and not very intuitive. Would like to be able to incorporate that logic into the while
loop somehow, or at least simplify it. Does anyone have an idea of how I could do this? (Side question: how do I benchmark this?)
EDIT:
This is the mid_point
function.
def mid_point(x, y):
return (x+y)/2
-
\$\begingroup\$ While you can easily make the best case take a number of primitive operations independent of the length \$n\$ of the list ("array"), the binary search for the insertion point as presented will take more. Similarly, for a single insertion, it should not be difficult to find a tighter bound than \$n\log n\$. \$\endgroup\$greybeard– greybeard2025年03月04日 15:52:49 +00:00Commented Mar 4 at 15:52
3 Answers 3
If you want to go deep into the implementation of binary search and insertion, my advice is to take a look at the bisect module in the standard library:
def insort_left(a, x, lo=0, hi=None):
"""Insert item x in list a, and keep it sorted assuming a is sorted.
If x is already in a, insert it to the left of the leftmost x.
Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
"""
if lo < 0:
raise ValueError('lo must be non-negative')
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
if a[mid] < x: lo = mid+1
else: hi = mid
a.insert(lo, x)
def bisect_left(a, x, lo=0, hi=None):
"""Return the index where to insert item x in list a, assuming a is sorted.
The return value i is such that all e in a[:i] have e < x, and all e in
a[i:] have e >= x. So if x already appears in the list, a.insert(x) will
insert just before the leftmost x already there.
Optional args lo (default 0) and hi (default len(a)) bound the
slice of a to be searched.
"""
if lo < 0:
raise ValueError('lo must be non-negative')
if hi is None:
hi = len(a)
while lo < hi:
mid = (lo+hi)//2
if a[mid] < x: lo = mid+1
else: hi = mid
return lo
-
\$\begingroup\$ Is this the code used by the module? (I never knew about the
//
operator, that's cool) \$\endgroup\$sotrh– sotrh2014年07月15日 19:36:49 +00:00Commented Jul 15, 2014 at 19:36 -
1\$\begingroup\$ @sotrh Yes, that's the code in python 2.7.6 stdlib. Note that
//
is floor division whose behavior is consistent in python 2 and 3, but regular/
doesn't behave the same way. More information here. \$\endgroup\$jcollado– jcollado2014年07月15日 21:34:57 +00:00Commented Jul 15, 2014 at 21:34 -
\$\begingroup\$ So basically
insort_left
andbisect_left
do the same thing exceptinsort_left
insertsx
into the list whilebisect_left
just returns where it should be inserted? \$\endgroup\$sotrh– sotrh2014年07月15日 21:43:06 +00:00Commented Jul 15, 2014 at 21:43 -
\$\begingroup\$ @sotrh Correct. As pointed out by the documentation
bisect.insort_left
is equivalent toa.insert(bisect.bisect_left(a, x, lo, hi), x)
\$\endgroup\$jcollado– jcollado2014年07月15日 21:53:54 +00:00Commented Jul 15, 2014 at 21:53 -
\$\begingroup\$ It's good to know about this module. I wrote my code as a mock which I'll use to write the java equivalent \$\endgroup\$sotrh– sotrh2014年07月15日 21:57:09 +00:00Commented Jul 15, 2014 at 21:57
Here are some thoughts:
For benchmarking there are two modules folks usually use:
cProfile
I put an example in a previous post located here:
Also, can you please post the entire script? Just want to make sure I have full context. Namely mid_point.
-
1\$\begingroup\$ I updated the post to include
mid_point
\$\endgroup\$sotrh– sotrh2014年07月15日 18:25:09 +00:00Commented Jul 15, 2014 at 18:25 -
\$\begingroup\$ I tried the
bettertimeit
module using the code from your other answer, but I'm gettingImportError: No module named bettertimeit
when I try to run it. \$\endgroup\$sotrh– sotrh2014年07月15日 19:44:27 +00:00Commented Jul 15, 2014 at 19:44 -
\$\begingroup\$ I'm running python 2.7.6 by the way. \$\endgroup\$sotrh– sotrh2014年07月15日 19:45:46 +00:00Commented Jul 15, 2014 at 19:45
-
\$\begingroup\$ @sotrh
bettertimeit
is not part of the stdlib. Your probably need to install it. \$\endgroup\$jcollado– jcollado2014年07月15日 21:36:01 +00:00Commented Jul 15, 2014 at 21:36 -
\$\begingroup\$ how might I do that? \$\endgroup\$sotrh– sotrh2014年07月15日 21:41:08 +00:00Commented Jul 15, 2014 at 21:41
linear cost
if index < 0: # Just append the value to the end of the list array.insert(0, value) else: array.insert(index, value)
Please understand that .insert()
here is expensive.
You would be much better off using
heapq.
But why are you working that hard? This is a solved problem. Simply rely upon the sortedcontainers package. It's fast, and it's correct.
Rather than an if
,
it would be more natural to talk about max(index, 0)
.
Also, most people would call that "prepend". Appending to a list happens in \$O(1)\$ time, while insertions cost \$O(n)\$ linear time to move \$n\$ existing elements out of the way.
-
1\$\begingroup\$ (One reason not to use sortedcontainers in July 2014 is that project not reaching 0.9 before September.) \$\endgroup\$greybeard– greybeard2025年03月04日 15:59:15 +00:00Commented Mar 4 at 15:59
Explore related questions
See similar questions with these tags.