Inputs are two sorted lists (of the same length) of numbers. I would like to return a new merged, sorted new list. I have written the following code, but most lines are used to deal with the end case. I am wondering if there is any better way to write it.
def merge(array1,array2):
result = [0]*len(array1)*2
i = 0 # left index
j = 0 # right index
for k in range(0,len(result)):
# when none of the pointer is at the end of the list
if i != len(array1)-1 and j != len(array2)-1:
if array1[i] < array2[j]:
result[k] = array1[i]
i = i + 1
elif array1[i] > array2[j]:
result[k] = array2[j]
j = j + 1
# the following codes are used to deal with the end cases.
# possible to write it more compactly?
elif i == len(array1)-1:
if j > len(array2)-1:
result[-1] = array1[-1]
return result
elif array1[i] < array2[j]:
result[k] = array1[i]
result[k+1:] = array2[j:]
return result
else:
result[k] = array2[j]
j = j + 1
elif j == len(array2)-1:
if i > len(array1)-1:
result[-1] = array2[-1]
elif array2[j] < array1[i]:
result[k] = array2[j]
result[(k+1):] = array1[i:]
return result
else:
result[k] = array1[i]
i = i + 1
return result
5 Answers 5
If you only want to support same-length array, you should do so explicitly, either by returning and empty list or an error code
It's harder to read if you have to go back in the code to check what
i
,j
andk
mean.
I find it's better to remove the comment and rename the variables to a more significant name:
left_index = 0
right_index = 0
for result_index in range(0,len(result)):
This means maybe you could also rename array1
and array2
to left_array
and right_array
If you keep using the result of a function, just store it. Also, the length of the two arrays is supposed to be the same, so no need to make a distinction between
len(array1)
andlen(array2)
This check is easier to read if you invert it, leaving this as the
else
case.
Something like:
# the following codes are used to deal with the end cases.
# possible to write it more compactly?
if left_index == len(left_array)-1:
[...]
elif right_index == len(right_index)-1:
[...]
else:
if left_array[left_index] < right_index[right_index]:
result[merged_index] = left_array[left_index]
left_index = left_index + 1
elif left_array[left_index] > right_index[right_index]:
result[merged_index] = right_index[right_index]
right_index = right_index + 1
return result
But, as @Simon said, you don't need all that code, because you're putting a lot of restrictions on the input data. They have to be the same length and the have to be sorted. Something like this should also work:
def merge(left_array, right_array):
if (len(left_array) != len(right_array)):
return []
array_length = len(left_array)
result_length = array_length * 2
result = [0] * result_length
left_index = 0
right_index = 0
for result_index in range(0, result_length):
if (left_index < array_length) and (left_array[left_index] <= right_array[right_index]):
result[result_index] = left_array[left_index]
result[result_index+1:] = right_array[right_index:]
left_index += 1
elif (right_index < array_length):
result[result_index] = right_array[right_index]
right_index += 1
return result
-
\$\begingroup\$ Some of your
right_index
should really beright_array
. \$\endgroup\$Graipher– Graipher2016年12月05日 18:16:19 +00:00Commented Dec 5, 2016 at 18:16 -
\$\begingroup\$
result[result_index+1:] = right_array[right_index:]
If I have[1,2,5]
and[3,4,6]
, then in the first iteration I would actually have[1,3,4,6,]
and in the second iteration, it will become[1,2,4,6]
? \$\endgroup\$Yuki Kawabata– Yuki Kawabata2016年12月06日 00:21:15 +00:00Commented Dec 6, 2016 at 0:21 -
\$\begingroup\$ @iseliget You can see how it puts elements in the array by adding a print :-) It will be
[1, 3, 4, 6]
,[1, 2, 3, 4, 6]
,[1, 2, 3, 4, 6]
,[1, 2, 3, 4, 6]
,[1, 2, 3, 4, 5, 6]
,[1, 2, 3, 4, 5, 6]
. So there will be some "wasted" operations but that's to be expected because you can't know in advance if/where you have gaps in the sequences. \$\endgroup\$ChatterOne– ChatterOne2016年12月06日 08:07:54 +00:00Commented Dec 6, 2016 at 8:07
You have a serious bug here:
if i != len(array1)-1 and j != len(array2)-1: if array1[i] < array2[j]: result[k] = array1[i] i = i + 1 elif array1[i] > array2[j]: result[k] = array2[j] j = j + 1
You have an if a < b
case and an elif a> b
case. Ask yourself, are there any other possibilities for the relationship between a
and b
? What happens then?
-
\$\begingroup\$ (Fun or not, there are else-statements in the end-elif statements.) \$\endgroup\$greybeard– greybeard2016年12月05日 08:22:22 +00:00Commented Dec 5, 2016 at 8:22
-
1\$\begingroup\$
a == b
is one possibility? \$\endgroup\$EKons– EKons2016年12月05日 12:21:12 +00:00Commented Dec 5, 2016 at 12:21 -
1\$\begingroup\$ True. Should have changed to
if left_array[left_index] <= right_array[right_index]
. Thanks! \$\endgroup\$Yuki Kawabata– Yuki Kawabata2016年12月06日 01:02:59 +00:00Commented Dec 6, 2016 at 1:02
I'll answer your question by providing an alternative solution. IDK why you're over complicating this when everything that has to be done is to combine the two lists, then simply sort them:
def merge(array1, array2):
array1.extend(array2)
return sorted(array1)
print(merge([1, 3, 4, 7], [0, 2, 5, 6, 8, 9])) > [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
The advantages of the above solution are:
- there's less code so it's easier to maintain / read
- it's using only two built-in functions (so assuming the lists are of a reasonable size, it should be quicker than implementing the sorting/merging in a loop)
There's also a one-liner which is even simpler than the above solution:
def merge(array1, array2):
return sorted(array1 + array2)
print(merge([1, 3, 4, 7], [0, 2, 5, 6, 8, 9])) > [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
You could also use the built-in heapq.merge for this.
Of course, if the lists are very large (like very large), merging two lists will totally make sense.
According to this SO answer unless len(array1 + array2)
~ 1000000 use:
L = array1 + array2
L.sort()
Those are just my 2-cents regarding the subject. I'll let somebody else comment on your code.
Extra: Even if practicing algorithms is a good idea, an even nicer aspect is knowing when / where to apply them.
-
\$\begingroup\$ It's good that you are considering performance benefits or costs for either tactic when weighing those against the simplicity and readability of the code. \$\endgroup\$klaar– klaar2016年12月05日 10:51:13 +00:00Commented Dec 5, 2016 at 10:51
-
1\$\begingroup\$ Shouldn't sorting appended lists be faster no matter what? There is the same memory cost, and timsort should be able to use the two ascending groups to make the sort just be a merge function written in C. Am I missing something? \$\endgroup\$Oscar Smith– Oscar Smith2016年12月05日 16:52:53 +00:00Commented Dec 5, 2016 at 16:52
You have a lot of logic in your code. It is true that there are some different end cases for the merge, but they really all depend on that that some of the arrays don't hold more values.
Some code style
This:
i = 0 # left index
j = 0 # right index
could be:
# left index, right index
i = j = 0
but at least:
# left index
i = 0
# right index
j = 0
incrementing
i = i + 1
is the same as :
i += 1
Redundant parentheses:
result[(k+1):] = array1[i:]
is the same as:
result[k+1:] = array1[i:]
Redundant start value:
for k in range(0,len(result)):
...
is the same as :
for k in range(len(result)):
...
If the list has unique values:
if array1[i] < array2[j]:
result[k] = array1[i]
i = i + 1
elif array1[i] > array2[j]:
result[k] = array2[j]
j = j + 1
could be:
if array1[i] < array2[j]:
result[k] = array1[i]
i += 1
else:
result[k] = array2[j]
j += 1
I must admit that I didn't commit to reading your end logic, but the rest of it was that you have special cases for where you slice. Well, there is only one place we really need to slice:
def get_mearged(a, b):
assert len(a) == len(b)
length = len(a)
merged_list = [0]*(length*2)
a_i = b_i = 0
for i in range(len(merged_list)):
if a[a_i] > b[b_i]:
merged_list[i] = b[b_i]
if b_i+2 > length:
return merged_list[:i+1] + a[a_i:]
b_i += 1
else:
merged_list[i] = a[a_i]
if a_i+2 > length:
return merged_list[:i+1] + b[b_i:]
a_i += 1
One of my favourite interview questions: The objective from the question (when I ask it) is to take advantage of the fact that the input arrays are sorted and in theory could be larger than memory, streaming etc. So we are looking for linear complexity. So joining and sorting is less efficient timewise.
You are correct that you seem to be wasting too much time checking for end state conditions in your main loop and is why I'd suggest you move those edges outside the main loop.
The way I approached this problem is to look at it as if the numbers were written down on paper and we had to do this by hand, this turns out to be a very efficient solution. Use two fingers, one on each list, move down picking the least one till one gets to the bottom first, then just copy the rest of the remaining list.
As the question asks for a potentially better way, here it is coded:
def merge(array1,array2):
result = [0]*(len(array1)+len(array2)) # cater for different size inputs
p1 = p2 = r = 0
# while each pointer/finger is on a list..
while (p1<len(array1) and p2<len(array2)):
# which one is smaller, add that to result
if array1[p1]<array2[p2]:
result[r] = array1[p1]
p1 += 1
else:
result[r] = array2[p2]
p2 += 1
r += 1
# at least one finger has reached the end of its list
# only one of the following while loops will execute
# if array1 still has elements in, copy the rest
while (p1<len(array1)):
result[r] = array1[p1]
p1 += 1
r += 1
# if array2 still has elements in, copy the rest
while (p2<len(array2)):
result[r] = array2[p2]
p2 += 1
r += 1
return result
-
\$\begingroup\$ rather than result to a full list, why not initialize to a empty list and append in the first loop? Then you can handle the trailing condition with a slice (rather than a loop). \$\endgroup\$Martin Bonner supports Monica– Martin Bonner supports Monica2016年12月05日 12:40:52 +00:00Commented Dec 5, 2016 at 12:40
-
\$\begingroup\$ @MartinBonner - good point, but I assume this is a massive array and rather than re-dimensioning it by pushing onto a stack, I allocate the space once. However, it would be trivial (and perhaps better) to do a single split and slice at the end rather than iterating. \$\endgroup\$just.jules– just.jules2016年12月05日 12:45:44 +00:00Commented Dec 5, 2016 at 12:45
-
1\$\begingroup\$ You have presented an alternative solution, but haven't reviewed the code. Please explain your reasoning (how your solution works and why it is better than the original) so that the author and other readers can learn from your thought process. \$\endgroup\$2016年12月05日 16:08:45 +00:00Commented Dec 5, 2016 at 16:08
-
\$\begingroup\$ thanks @Mast, I've clarified the code algorithm review and added some more detail to take the reader through the thought process and example, comments welcome! \$\endgroup\$just.jules– just.jules2016年12月05日 21:26:18 +00:00Commented Dec 5, 2016 at 21:26
-
\$\begingroup\$ Are you sure your code actually compiles and run? I don't think you tried it. For example, you have a
&&
instead ofand
, you're missing a:
at theelse
line, you're using a variable calledarray
which is not defined and you probably wantreturn result
instead of justresult
. \$\endgroup\$ChatterOne– ChatterOne2016年12月06日 08:16:41 +00:00Commented Dec 6, 2016 at 8:16
Explore related questions
See similar questions with these tags.
extend
orsorted
? \$\endgroup\$reinventing-the-wheel
to your question, otherwise everyone is just going to tell you the same thing. \$\endgroup\$