My mergesort implementation currently takes a very long time, for a list with 1 million elements it takes over 130 seconds to get the list sorted.
Could someone kindly have a look at my code and suggest what could I do to improve it?
Is there anything particular in my code that is taking significantly long?
Code
def splitlist(L): #splits the list to a list of individual listed elements (e.g. [1,2] to [[1],[2]])
envelop = lambda x: [x]
return(list(map(envelop,L)))
def merge(L_1,L_2): #merges two (already sorted) lists to one sorted list
N = []
while len(L_1) > 0 and len(L_2) > 0:
if L_1[0] > L_2[0]:
N += [L_2.pop(0)]
else:
N += [L_1.pop(0)]
if len(L_1) == 0:
N += L_2
else:
N += L_1
return(N)
#performs one round of pairwise merges (e.g. [[2],[1],[4],[3]] to [[1,2],[3,4]]), or [[5,10],[1,8],[2,3]] to [[1,2,3,5,8,10]])
def mergelist(L):
N = []
if len(L) % 2 == 0:
for i in range(0,len(L)//2):
N += [merge(L[2*i],L[2*i + 1])]
else:
for i in range(0,len(L)//2 - 1):
N += [merge(L[2*i],L[2*i + 1])]
N += [merge(merge(L[-3],L[-2]),L[-1])]
return(N)
def mergesort(L): #recursively performs mergelist until there is only 1 sorted list remaining
L = splitlist(L)
while len(L) > 1:
L = mergelist(L)
return(L[0])
Here is my code for generating the million elements:
rlist = random.sample(range(0,2000000),1000000)
1 Answer 1
The pop(0)
takes linear time. Do that differently, in O(1) time. The standard way uses index variables. See this question's answers for some more pythonic ways. Or you could merge from right to left, using pop()
, and then in the end reverse()
the result.
One way to do the latter:
def merge(L1, L2):
"""Merges two (already sorted) lists to one sorted list."""
N = []
while L1 and L2:
L = L1 if L1[-1] > L2[-1] else L2
N.append(L.pop())
N.reverse()
N[:0] = L1 or L2
return N
Other changes I did and which you can apply in the other parts of your code as well:
- Removed the underscores from the variables, I think it reads better. I kept them upper case because for
L
, that's what PEP 8 says. And then I keptN
for consistency. Usually I'd useresult
or maybemerged
. Don't know why you choseN
. If you have a meaningful word that starts with "n", then I suggest using that. - Space between the function parameters.
- Proper docstring format instead of comment.
- Replaced
len(L_1) > 0
with the normalL1
non-emptiness check. - Replaced
N += [x]
with the normalN.append(x)
.
Just another way, replacing that one "long" line to determine L
with a clearer but slower way:
def merge(L1, L2):
"""Merges two (already sorted) lists to one sorted list."""
N = []
def last(L):
return L[-1]
while L1 and L2:
L = max(L2, L1, key=last)
N.append(L.pop())
N.reverse()
N[:0] = L1 or L2
return N
For some fun, two list comprehension hacks:
def merge(L1, L2):
"""Merges two (already sorted) lists to one sorted list."""
def end(L):
return L[-1:]
return [max(L2, L1, key=end).pop() for _ in L1 + L2][::-1]
def merge(L, R):
"""Merges two (already sorted) lists to one sorted list."""
return [(R, L)[L[-1:] > R[-1:]].pop() for _ in L + R][::-1]
And I don't want to leave without a much faster way:
def merge(L1, L2):
"""Merges two (already sorted) lists to one sorted list."""
return sorted(L1 + L2)
That's O(n) because of Timsort. And fast O(n) because of C code. If you think using the mighty sorted
inside a mergesort defeats the purpose of writing the mergesort in the first place: Even that can be meaningful, if you're not just doing mergesort. At least once I wrote a mergesort with embedded counting of something, and I indeed used sorted
just for the merging. Because that made my solution faster/shorter/simpler.
Even more efficient (both space and time):
def merge(L1, L2):
"""Merges two (already sorted) lists to one sorted list."""
L1 += L2
L1.sort()
return L1
(If L2
can be longer than L1
, it might be advantageous to insert L1
into L2
instead.)
-
1\$\begingroup\$ This is great thank you. Managed to get it to just over 10 seconds by merging from right to left \$\endgroup\$Nishil Patel– Nishil Patel2020年09月16日 15:13:55 +00:00Commented Sep 16, 2020 at 15:13
-
1\$\begingroup\$ Um, could someone tell me why this deserves a downvote? How is this not useful? \$\endgroup\$superb rain– superb rain2020年09月16日 15:18:35 +00:00Commented Sep 16, 2020 at 15:18
-
1\$\begingroup\$ (It was downvoted when it was only the first paragraph, before I added the rest. Still wondering how that first paragraph fits the "This answer is not useful" that the downvote button represents.) \$\endgroup\$superb rain– superb rain2020年09月16日 16:03:45 +00:00Commented Sep 16, 2020 at 16:03
-
\$\begingroup\$ For the list comprehension, what does the underscore in the last line mean? \$\endgroup\$Nishil Patel– Nishil Patel2020年09月16日 16:59:11 +00:00Commented Sep 16, 2020 at 16:59
-
1\$\begingroup\$ @NishilPatel That's the conventional placeholder variable name for when you don't actually use the value. Here I don't. I just do
L1 + L2
to get the number of elements, so that the list comprehension has the correct length. At that place, I don't care what the elements are, so I assign them to_
. \$\endgroup\$superb rain– superb rain2020年09月16日 17:19:12 +00:00Commented Sep 16, 2020 at 17:19
Explore related questions
See similar questions with these tags.
numpy.array
instead of lists, those are better optimized. \$\endgroup\$