Insertion sort implementation in Python

Question 1

I came up with a small optimization while practice coding insertion sort in Python. It's mostly about not accessing the input array too much and letting the hardware do element shifts instead of swapping manually. Could someone comment on my coding style as well as the optimization idea?

Original Version

# this version of insertion sort is half as fast as selection sort on my machine
def runslow(input):
 for i in range(1,len(input)):
 j = i
 # list being accessed twice in this comparison
 while j > 0 and input[j] < input[j-1]:
 # unnecessary writes creating slowdown
 input[j-1], input[j] = input[j], input[j-1]
 j -= 1

Improved Version

# the optimized insertion sort
def run(input):
 for i in range(1,len(input)):
 j = i
 # extract the value being compared
 saved = input[i]
 # now list isn't being accessed twice unnecessarily
 while(j > 0 and input[j-1] > saved):
 # don't actually perform the intermediate swaps,
 # just let the cursor settle into the final position
 # input[j - 1], input[j] = input[j], input[j - 1]
 j-=1
 # now insert it and let the hardware do the actual shifting of the values in the list
 input.insert(j,input.pop(i))
 return input

Question 2

TL;DR

Be as specific as you can:

When naming functions,
And when setting up loops.

Details:

Here's what I've noticed in your code, line-by-line:

def run(input)

Avoid common names for your functions. While it's not always possible to prevent having the same name as functions from other modules (there are methods to deal with that), naming a function run is asking for trouble. For example, a commonly used command in Python test-suites is also run.

Your function name should match as closely as possible the specific goal of the function. e.g. maybe call it insertion_sort.

for i in range(1,len(input)):

Try to use the most specific iterators possible.

The general construction for i in range(... gets used a lot where it doesn't have to be. Any time you see it, there is usually a better way iterate, whether using a list comprehension, using Itertools, or other more specific methods. In your case, replacing for i in range( with for i in input results helps a lot (as in my example below).

The next two lines j = 1 and saved = input[i] are only there to convert i from the non-specific `for I in range (...' to the values you actually want. They can be elimited by changing how the loop is set up.

 while(j > 0 and input[j-1] > saved):

The same goes for your while loop. You will never need a i+=1 (or j-=1) to increment a loop in Python. It's always best to think of what information the inside of the loop needs, and set up an iterator, list comprehension, or for loop that immediately provides that information.

Here is a modified version of your code, to show what I mean about being specific with loops:

def insertion_sort(input):
 for i,n in enumerate(input):
 ins_pos = 0 # default location to insert n
 for j,m in reversed([r for r in enumerate(input[:i]])):
 if m > n:
 ins_pos = j + 1 # insert n one position after first larger number found
 break
 input.insert(ins_pos, input.pop(i)) 
 return input

Once inside the loop, you only need:

i and j, the indices of the values to pop and insert.
n and m, the values to compare.

Once you know what the inside of the loop needs, set up the loop statements (e.g. In my example, for i,n in enumerate(input), and for m in reversed(input[i])) so they return only what is needed. This eliminates the extra lines in your code. Also, after this modification, the purpose of each loop is more clear.

 input.insert(j,input.pop(i))

PEP8 specifies spaces between arguments in function calls.

And a note on the last line:

Try not to modify a list that is the source of your iterator.

In general, it's a bad idea to modify the same list that is being iterated over. Your example only works because the pop and insert functions get applied to the parts of the input list that have already been read through by the iterator. If you need to do it in future, be aware of why this works, and how other examples of the same practice might go wrong. I've assumed that part of your spec is to modify the list in place, so I've left it how it is.

Performance:

Nice work with eliminating the extra slice steps. Re-writing the list with each step in the inner while loop is indeed more expensive than just comparing values and moving on. However, both pop, and insert within a list are both \$\mathcal{O} (n)\,ドル not \$\mathcal{O} (1)\,ドル sad but true. There may be a small speed-up of a constant factor, doing away with the slices, but both your versions of insertion sort are ultimately limited by the time complexity of the algorithm itself, which is \$\mathcal{O} (n^2)\,ドル where n is the length of the list.

This line:

 saved = input[i]

Won't change anything as, in Python, this sets up saved as a pointer to the value in input[i], rather than making a copy and storing it in saved. You could explicitly force a copy, but this would probably be self-defeating, as reading an index from an array is no more expensive than accessing any other int.

Question 3

Having the saved = input[i] line externalized, instead of having the input[i] inside the while loop makes the algo 12% faster. Maybe the interpreter is caching local variables in the background.

Question 4

Maybe. I tried to repeat what you describe, but I get a random variation in time of about +/- 20 % with a list of 400 integers. Anything larger than that, and most applications are going to switch to quicksort or heapsort. Chasing a12 % improvement in insertion sort is going to depend on the interpreter, as you say.

Ryan Mills Ryan Mills 1,0105 silver badges6 bronze badges · Accepted Answer · 2017-02-18 06:12:59Z

TL;DR

Be as specific as you can:

When naming functions,
And when setting up loops.

Details:

Here's what I've noticed in your code, line-by-line:

def run(input)

Avoid common names for your functions. While it's not always possible to prevent having the same name as functions from other modules (there are methods to deal with that), naming a function run is asking for trouble. For example, a commonly used command in Python test-suites is also run.

Your function name should match as closely as possible the specific goal of the function. e.g. maybe call it insertion_sort.

for i in range(1,len(input)):

Try to use the most specific iterators possible.

The general construction for i in range(... gets used a lot where it doesn't have to be. Any time you see it, there is usually a better way iterate, whether using a list comprehension, using Itertools, or other more specific methods. In your case, replacing for i in range( with for i in input results helps a lot (as in my example below).

The next two lines j = 1 and saved = input[i] are only there to convert i from the non-specific `for I in range (...' to the values you actually want. They can be elimited by changing how the loop is set up.

 while(j > 0 and input[j-1] > saved):

The same goes for your while loop. You will never need a i+=1 (or j-=1) to increment a loop in Python. It's always best to think of what information the inside of the loop needs, and set up an iterator, list comprehension, or for loop that immediately provides that information.

Here is a modified version of your code, to show what I mean about being specific with loops:

def insertion_sort(input):
 for i,n in enumerate(input):
 ins_pos = 0 # default location to insert n
 for j,m in reversed([r for r in enumerate(input[:i]])):
 if m > n:
 ins_pos = j + 1 # insert n one position after first larger number found
 break
 input.insert(ins_pos, input.pop(i)) 
 return input

Once inside the loop, you only need:

i and j, the indices of the values to pop and insert.
n and m, the values to compare.

Once you know what the inside of the loop needs, set up the loop statements (e.g. In my example, for i,n in enumerate(input), and for m in reversed(input[i])) so they return only what is needed. This eliminates the extra lines in your code. Also, after this modification, the purpose of each loop is more clear.

 input.insert(j,input.pop(i))

PEP8 specifies spaces between arguments in function calls.

And a note on the last line:

Try not to modify a list that is the source of your iterator.

In general, it's a bad idea to modify the same list that is being iterated over. Your example only works because the pop and insert functions get applied to the parts of the input list that have already been read through by the iterator. If you need to do it in future, be aware of why this works, and how other examples of the same practice might go wrong. I've assumed that part of your spec is to modify the list in place, so I've left it how it is.

Performance:

Nice work with eliminating the extra slice steps. Re-writing the list with each step in the inner while loop is indeed more expensive than just comparing values and moving on. However, both pop, and insert within a list are both \$\mathcal{O} (n)\,ドル not \$\mathcal{O} (1)\,ドル sad but true. There may be a small speed-up of a constant factor, doing away with the slices, but both your versions of insertion sort are ultimately limited by the time complexity of the algorithm itself, which is \$\mathcal{O} (n^2)\,ドル where n is the length of the list.

This line:

 saved = input[i]

Won't change anything as, in Python, this sets up saved as a pointer to the value in input[i], rather than making a copy and storing it in saved. You could explicitly force a copy, but this would probably be self-defeating, as reading an index from an array is no more expensive than accessing any other int.

Having the saved = input[i] line externalized, instead of having the input[i] inside the while loop makes the algo 12% faster. Maybe the interpreter is caching local variables in the background.
Maybe. I tried to repeat what you describe, but I get a random variation in time of about +/- 20 % with a list of 400 integers. Anything larger than that, and most applications are going to switch to quicksort or heapsort. Chasing a12 % improvement in insertion sort is going to depend on the interpreter, as you say.

Stack Exchange Network

Insertion sort implementation in Python

1 Answer 1

TL;DR

Details:

Performance:

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Insertion sort implementation in Python

1 Answer 1

TL;DR

Details:

Performance:

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions