I've written a implementation of a merge sort in Python. It seems correct and I have tested it with this:
l = list(range(1000))
random.shuffle(l) # random.shuffle is in-place
mergesort(l) == list(range(1000)) # returns True
# Stable sort
cards = list(itertools.product(range(4), range(13)))
random.shuffle(cards)
cards = mergesort(cards, key=lambda x: x[1]) # Sort by rank
cards = mergesort(cards, key=lambda x: x[0]) # Sort by suit
cards == list(itertools.product(range(4), range(13))) # returns True
I've also tested its performance, comparing it to sorted
and the merge sort implementation in rosetta code:
rl = list(range(100))
random.shuffle(rl)
%timeit sorted(rl)
# 100000 loops, best of 3: 11.3 μs per loop
%timeit mergesort(rl) # My code
# 1000 loops, best of 3: 376 μs per loop
%timeit rcmerge_sort(rl) # From rosetta code
# 1000 loops, best of 3: 350 μs per loop
I'm looking for any suggestions on how to improve this code. I suspect there is a better way to do the mergelist
function, particularly in how I tried to avoid code duplication like:
if top_a <= top_b:
nl.append(top_a)
try:
top_a = next(it_a)
except:
...
else:
# duplicates above code
In my code I placed the iterators and first values in a list, then use the variable k
as index, but this leads to hacks like abs(k-1)
and placing magic numbers 0
and 1
in the code.
def mergesort(l, key=None):
# Split the list into sublists of length 1
sublists = [[x] for x in l]
while len(sublists) > 1:
new_sublists = []
# Create a generator that yields two sublists at a time
sublists_pairs = ((sublists[2*x], sublists[2*x+1])
for x in range(len(sublists)//2))
for a, b in sublists_pairs:
new_sublists.append(mergelist(a, b, key))
# If the length is odd, then there is one sublist that is not merged
if len(sublists) % 2 != 0:
new_sublists.append(sublists[-1])
sublists = new_sublists
return new_sublists[0]
def mergelist(a, b, key=None):
nl = []
# Iterators that yield values from a and b
its = iter(a), iter(b)
# The top of both lists
tops = [next(it) for it in its]
while True:
# Determine the iterator that the next element should be taken from
if key:
k = 0 if key(tops[0]) <= key(tops[1]) else 1
else:
k = 0 if tops[0] <= tops[1] else 1
nl.append(tops[k])
try:
# Update the top of the iterator
tops[k] = next(its[k])
except StopIteration:
# Unless the iterator is empty, in which case get the rest of
# the values from the other iterator
# abs(k-1) is similar to (0 if k == 1 else 1)
nl.append(tops[abs(k-1)])
for e in its[abs(k-1)]:
nl.append(e)
return nl
2 Answers 2
- Using
list(x)
is better than[x]
because it is more explicit. Your while loop is weird, delete
try: # Update the top of the iterator tops[k] = next(its[k]) except StopIteration: # Unless the iterator is empty, in which case get the rest of # the values from the other iterator # abs(k-1) is similar to (0 if k == 1 else 1) nl.append(tops[abs(k-1)]) for e in its[abs(k-1)]: nl.append(e)
and put something like
if (condition):
#thing1
else:
#thing2
break
-
\$\begingroup\$ Thanks for the reply. I think list(x) would make a list from the elements of x, but the program needs to create a list with a single element, which would be done with list as
list((x,))
. Also, I don't understand what should I put in the if statement, can you elaborate? \$\endgroup\$parchment– parchment2014年11月03日 00:19:16 +00:00Commented Nov 3, 2014 at 0:19 -
\$\begingroup\$ The if statement works in simpler cases, I think it is easier to read, in your case, I don't think it can be used, I'm not good at using iterators. \$\endgroup\$Caridorc– Caridorc2014年11月03日 18:33:19 +00:00Commented Nov 3, 2014 at 18:33
The while loop can be written as follows (although I think it does not speed things up):
while True:
# Determine the iterator that the next element should be taken from
if key:
low,high = (0,1) if key(tops[0]) <= key(tops[1]) else (1,0)
else:
low,high = (0,1) if tops[0] <= tops[1] else (1,0)
nl.append(tops[low])
# Update the top of the iterator
tops[low] = next(its[low], None)
if not tops[low]:
# Unless the iterator is empty, in which case get the rest of
# the values from the other iterator
nl.append(tops[high])
nl.extend(its[high])
return nl
-
\$\begingroup\$ Thank you for the reply. If does get rid of the ugly call to abs, but I'm not sure about replacing the try - catch block with an if statement. What if there's a 0 in the list to be sorted? \$\endgroup\$parchment– parchment2014年11月03日 00:13:24 +00:00Commented Nov 3, 2014 at 0:13
-
\$\begingroup\$ Good point. Rewrite as:
if tops[low]==None
\$\endgroup\$Arvind Padmanabhan– Arvind Padmanabhan2014年11月04日 10:53:44 +00:00Commented Nov 4, 2014 at 10:53