Implementing basic sorting algorithms to learn them, and coding, better. Criticisms/ critiques welcome. Also possible optimizations.
import unittest
import random
def insertion_sort(seq):
"""Accepts a mutable sequence, utilizes insertion sort algorithm to sort in
place. Returns an ordered list"""
#marks point between ordered sublist & unordered list
partition = 1
while partition < len(seq):
temp = partition
#while temp not in correct pos in sublist, decrement pos
while temp != 0 and seq[temp] < seq[temp-1]:
seq[temp], seq[temp-1] = seq[temp-1], seq[temp]
temp -= 1
partition += 1
return seq
class test_insertionsort(unittest.TestCase):
def test_insertionsort(self):
"""Test insertion_sort()"""
seq = [random.randrange(0, 1000) for _ in range(1000)]
self.assertEqual(insertion_sort(seq), sorted(seq))
if __name__ == '__main__':
unittest.main()
3 Answers 3
One thing that can speed up insertion sort is to pull the partition value to the side and slide elements up into the vacancy until the correct location is found, then drop the partition value into that location. This eliminates one of the three assignments in the while
loop:
for p in range(1, len(seq)):
if seq[p] >= seq[p-1]:
continue
p_value = seq[p]
while p != 0 and p_value < seq[p-1]:
seq[p] = seq[p-1]
p -= 1
seq[p] = p_value
return seq
The unit test shows a substantial speed improvement from doing this.
Answers from Gareth Rees to your other questions about selection sort and bubble sort still apply here but it is clear that you tried to take them into account already.
You can loop with for partition in range(1, len(seq)):
. Also, it removes the need for an additional variable as it doesn't really matter anymore if you lose the current value of partition
: it will have the right value at the beginning of the next iteration anyway.
Here's what you could do :
for p in range(1, len(seq)):
while p != 0 and seq[p] < seq[p-1]:
seq[p], seq[p-1] = seq[p-1], seq[p]
p -= 1
return seq
Insertion sorts get a bad rap, mostly because people use them the wrong way. Sure the bench mark for all sorting algorithms is to sort an array in place, but in the real world your data doesn't always come to you in a complete package, sometimes it arrives asynchronously.
Instead of force-fitting an insertion sort to existing data, let's redefine the problem to find the best context for an insertion sort:
import unittest
import random
class SortedList(list):
def insert_item(self, new_item):
insertion_point = len(self)
for i, item in enumerate(self):
if new_item < item:
insertion_point = i
break
self.insert(insertion_point, new_item)
class test_insertionsort(unittest.TestCase):
def test_insertionsort(self):
"""Test insertion_sort()"""
sl = SortedList()
seq = []
for i in range(10000):
item = random.randrange(0, 10000)
sl.insert_item(item)
seq.append(item)
self.assertEqual(sl, sorted(seq))
if __name__ == '__main__':
unittest.main()
Here, instead of generating all the data before the sort, it is fed into the sort as it appears.
(And note that a ~2x improvement is possible to optimize the insertion by starting in the middle of the list instead of always from the beginning).
-
1\$\begingroup\$ Hi, welcome to Code Review! Here, an answer should be about the code in the question. It's ok to present an alternative implementation, but then you should explain what issues it fixes, or in what way it's more optimal than the code in the question. \$\endgroup\$janos– janos2015年10月29日 08:16:42 +00:00Commented Oct 29, 2015 at 8:16
Explore related questions
See similar questions with these tags.
seq.pop
andseq.insert
is not really in the spirit of the exercise. These methods hide significant details of what the algorithm is doing. To really understand what is going on, you should try to implement it using only list assignment. \$\endgroup\$