Mergesort in python

Question 1

My mergesort implementation currently takes a very long time, for a list with 1 million elements it takes over 130 seconds to get the list sorted.

Could someone kindly have a look at my code and suggest what could I do to improve it?
Is there anything particular in my code that is taking significantly long?

Code

def splitlist(L): #splits the list to a list of individual listed elements (e.g. [1,2] to [[1],[2]])
 envelop = lambda x: [x]
 return(list(map(envelop,L)))
def merge(L_1,L_2): #merges two (already sorted) lists to one sorted list
 N = []
 while len(L_1) > 0 and len(L_2) > 0:
 if L_1[0] > L_2[0]:
 N += [L_2.pop(0)]
 else:
 N += [L_1.pop(0)]
 if len(L_1) == 0:
 N += L_2
 else:
 N += L_1
 
 return(N)
#performs one round of pairwise merges (e.g. [[2],[1],[4],[3]] to [[1,2],[3,4]]), or [[5,10],[1,8],[2,3]] to [[1,2,3,5,8,10]]) 
def mergelist(L): 
 N = []
 if len(L) % 2 == 0:
 for i in range(0,len(L)//2):
 N += [merge(L[2*i],L[2*i + 1])]
 else:
 for i in range(0,len(L)//2 - 1):
 N += [merge(L[2*i],L[2*i + 1])]
 N += [merge(merge(L[-3],L[-2]),L[-1])]
 
 return(N)
def mergesort(L): #recursively performs mergelist until there is only 1 sorted list remaining
 L = splitlist(L)
 while len(L) > 1:
 L = mergelist(L)
 return(L[0])

Here is my code for generating the million elements:

rlist = random.sample(range(0,2000000),1000000)

Question 2

You could use numpy.array instead of lists, those are better optimized.

Question 3

The pop(0) takes linear time. Do that differently, in O(1) time. The standard way uses index variables. See this question's answers for some more pythonic ways. Or you could merge from right to left, using pop(), and then in the end reverse() the result.

One way to do the latter:

def merge(L1, L2):
 """Merges two (already sorted) lists to one sorted list."""
 N = []
 while L1 and L2:
 L = L1 if L1[-1] > L2[-1] else L2
 N.append(L.pop())
 N.reverse()
 N[:0] = L1 or L2
 return N

Other changes I did and which you can apply in the other parts of your code as well:

Removed the underscores from the variables, I think it reads better. I kept them upper case because for L, that's what PEP 8 says. And then I kept N for consistency. Usually I'd use result or maybe merged. Don't know why you chose N. If you have a meaningful word that starts with "n", then I suggest using that.
Space between the function parameters.
Proper docstring format instead of comment.
Replaced len(L_1) > 0 with the normal L1 non-emptiness check.
Replaced N += [x] with the normal N.append(x).

Just another way, replacing that one "long" line to determine L with a clearer but slower way:

def merge(L1, L2):
 """Merges two (already sorted) lists to one sorted list."""
 N = []
 def last(L):
 return L[-1]
 while L1 and L2:
 L = max(L2, L1, key=last)
 N.append(L.pop())
 N.reverse()
 N[:0] = L1 or L2
 return N

For some fun, two list comprehension hacks:

def merge(L1, L2):
 """Merges two (already sorted) lists to one sorted list."""
 def end(L):
 return L[-1:]
 return [max(L2, L1, key=end).pop() for _ in L1 + L2][::-1]

def merge(L, R):
 """Merges two (already sorted) lists to one sorted list."""
 return [(R, L)[L[-1:] > R[-1:]].pop() for _ in L + R][::-1]

And I don't want to leave without a much faster way:

def merge(L1, L2):
 """Merges two (already sorted) lists to one sorted list."""
 return sorted(L1 + L2)

That's O(n) because of Timsort. And fast O(n) because of C code. If you think using the mighty sorted inside a mergesort defeats the purpose of writing the mergesort in the first place: Even that can be meaningful, if you're not just doing mergesort. At least once I wrote a mergesort with embedded counting of something, and I indeed used sorted just for the merging. Because that made my solution faster/shorter/simpler.

Even more efficient (both space and time):

def merge(L1, L2):
 """Merges two (already sorted) lists to one sorted list."""
 L1 += L2
 L1.sort()
 return L1

(If L2 can be longer than L1, it might be advantageous to insert L1 into L2 instead.)

Question 4

This is great thank you. Managed to get it to just over 10 seconds by merging from right to left

Question 5

Um, could someone tell me why this deserves a downvote? How is this not useful?

Question 6

(It was downvoted when it was only the first paragraph, before I added the rest. Still wondering how that first paragraph fits the "This answer is not useful" that the downvote button represents.)

Question 7

For the list comprehension, what does the underscore in the last line mean?

Question 8

@NishilPatel That's the conventional placeholder variable name for when you don't actually use the value. Here I don't. I just do L1 + L2 to get the number of elements, so that the list comprehension has the correct length. At that place, I don't care what the elements are, so I assign them to _.

superb rain superb rainsuperb rain 4,0017 silver badges22 bronze badges · Accepted Answer · 2020-09-16 15:03:09Z

The pop(0) takes linear time. Do that differently, in O(1) time. The standard way uses index variables. See this question's answers for some more pythonic ways. Or you could merge from right to left, using pop(), and then in the end reverse() the result.

One way to do the latter:

def merge(L1, L2):
 """Merges two (already sorted) lists to one sorted list."""
 N = []
 while L1 and L2:
 L = L1 if L1[-1] > L2[-1] else L2
 N.append(L.pop())
 N.reverse()
 N[:0] = L1 or L2
 return N

Other changes I did and which you can apply in the other parts of your code as well:

Removed the underscores from the variables, I think it reads better. I kept them upper case because for L, that's what PEP 8 says. And then I kept N for consistency. Usually I'd use result or maybe merged. Don't know why you chose N. If you have a meaningful word that starts with "n", then I suggest using that.
Space between the function parameters.
Proper docstring format instead of comment.
Replaced len(L_1) > 0 with the normal L1 non-emptiness check.
Replaced N += [x] with the normal N.append(x).

Just another way, replacing that one "long" line to determine L with a clearer but slower way:

def merge(L1, L2):
 """Merges two (already sorted) lists to one sorted list."""
 N = []
 def last(L):
 return L[-1]
 while L1 and L2:
 L = max(L2, L1, key=last)
 N.append(L.pop())
 N.reverse()
 N[:0] = L1 or L2
 return N

For some fun, two list comprehension hacks:

def merge(L1, L2):
 """Merges two (already sorted) lists to one sorted list."""
 def end(L):
 return L[-1:]
 return [max(L2, L1, key=end).pop() for _ in L1 + L2][::-1]

def merge(L, R):
 """Merges two (already sorted) lists to one sorted list."""
 return [(R, L)[L[-1:] > R[-1:]].pop() for _ in L + R][::-1]

And I don't want to leave without a much faster way:

def merge(L1, L2):
 """Merges two (already sorted) lists to one sorted list."""
 return sorted(L1 + L2)

That's O(n) because of Timsort. And fast O(n) because of C code. If you think using the mighty sorted inside a mergesort defeats the purpose of writing the mergesort in the first place: Even that can be meaningful, if you're not just doing mergesort. At least once I wrote a mergesort with embedded counting of something, and I indeed used sorted just for the merging. Because that made my solution faster/shorter/simpler.

Even more efficient (both space and time):

def merge(L1, L2):
 """Merges two (already sorted) lists to one sorted list."""
 L1 += L2
 L1.sort()
 return L1

(If L2 can be longer than L1, it might be advantageous to insert L1 into L2 instead.)

This is great thank you. Managed to get it to just over 10 seconds by merging from right to left
Um, could someone tell me why this deserves a downvote? How is this not useful?
(It was downvoted when it was only the first paragraph, before I added the rest. Still wondering how that first paragraph fits the "This answer is not useful" that the downvote button represents.)
For the list comprehension, what does the underscore in the last line mean?
@NishilPatel That's the conventional placeholder variable name for when you don't actually use the value. Here I don't. I just do L1 + L2 to get the number of elements, so that the list comprehension has the correct length. At that place, I don't care what the elements are, so I assign them to _.

Stack Exchange Network

Mergesort in python

Code

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Mergesort in python

Code

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions