I have solved a leetcode problem (Median of Two Sorted Arrays), and I came up with two solutions.
Solution 1
def findMedianSortedArrays(nums1, nums2) -> float:
nums1.extend(nums2)
nums1.sort()
median = 0
num_len = len(nums1)
while((num_len % 2) == 0):
median = (nums1[int((num_len - 1)/2)] + nums1[int(num_len/2)]) / 2
median = nums1[int(num_len/2)]
return median
Solution 2
import statistics
def findMedianSortedArrays(nums1, nums2) -> float:
nums1.extend(nums2)
nums1.sort()
median = 0
median = statistics.median(nums1)
return median
How can I improve the speed?
-
4\$\begingroup\$ How does the first solution ever terminate when the combined list has an even length? \$\endgroup\$Wombatz– Wombatz2022年10月20日 08:27:08 +00:00Commented Oct 20, 2022 at 8:27
-
3\$\begingroup\$ You should be able to do this without modifying the inputs, and without copying either array. \$\endgroup\$Toby Speight– Toby Speight2022年10月20日 10:28:14 +00:00Commented Oct 20, 2022 at 10:28
-
2\$\begingroup\$ You should also be able to do this without re-sorting already sorted arrays. \$\endgroup\$Teepeemm– Teepeemm2022年10月20日 16:06:31 +00:00Commented Oct 20, 2022 at 16:06
4 Answers 4
Assuming that nums1
and nums2
are each already sorted,
nums1.extend(nums2) nums1.sort()
fails to take advantage of that fact. nums1.sort()
could take O(n log n) time, when it could be as fast as O(n) if you merge the two lists manually.
Furthermore, modifying one of the input parameters, especially without clearly documenting it, is unclean programming practice.
In both implementations, median = 0
is a superfluous assignment.
-
\$\begingroup\$ nums1 & nums2 were not sorted so that's why it was sorted. \$\endgroup\$111– 1112022年10月20日 16:45:48 +00:00Commented Oct 20, 2022 at 16:45
-
2\$\begingroup\$ All I had to go on was the name of the task ("Median of Two Sorted Arrays") and the function name (
findMedianSortedArrays
). If it's not the case that the input arrays are already sorted, then my critique is that your function name is misleading. \$\endgroup\$200_success– 200_success2022年10月20日 17:08:17 +00:00Commented Oct 20, 2022 at 17:08 -
\$\begingroup\$ The two sorted arrays are sorted separately so when combing them they would not be sorted and hence need to be sorted. \$\endgroup\$111– 1112022年10月20日 17:45:25 +00:00Commented Oct 20, 2022 at 17:45
-
3\$\begingroup\$ Read my answer carefully. I'm saying that you should be able to take advantage of the fact that the inputs are already individually sorted. \$\endgroup\$200_success– 200_success2022年10月20日 18:10:46 +00:00Commented Oct 20, 2022 at 18:10
You don't need to merge or sort the arrays. But it all depends on your goal. Assume the lengths of the lists are N
and M
. As you are leet code, the "correct" answer is likely the complex solution, but in the real world, the simple and balanced solution is what I would reach for 90% of the time.
Simple
If you want a "reasonable" solution and move on with your life, merge then sort then take the middle is clear, maintainable, and fast "enough" at O((N+M) log(N+M))
.
Balanced
If you care a bit more about performance, you can walk along the two arrays simultaneously, advancing in whichever has the smaller number. When you have consumed half the numbers you are at the median. This is more complex, but doesn't require allocations and is O(N+M)
compute time.
Complex
If you really care about performance and are prepared to write complex code. You can actually be even faster and get down to O(log(N+M))
. This will involve playing with the definition of median, thinking what it looks like to have identified the median elements, and then doing something akin to a binary search. Hint, i+j == (N+M)/2
, roughly, depending on parity of N+M
.
This is more complex, it's harder to maintain, and much more time to work out. But if you need to do this repeatedly, the better asymptotic behaviour (the O()
function) might be worth it.
I agree with @200_success that sorting the lists manually would yield faster results. That being said, here's a quick implementation of what that might look like:
def findMedianSortedArrays(nums1, nums2):
l1 = len(nums1)
l2 = len(nums2)
l3 = 1 + (l1 + l2) / 2
# Pad the input with inf to account for edge cases.
# You can technically use try/catch instead if you reeeeeally want
# to be space efficient, but this just feels nicer to me.
first = nums1 + [float('inf')]*(l3-l1)
second = nums2 + [float('inf')]*(l3-l2)
merged = [None]*l3
# Build the first half of the merged list.
# This will be enough to calculate the median.
i = j = k = 0
while k < l3:
if first[i] < second[j]:
merged[k] = first[i]
i += 1
else:
merged[k] = second[j]
j += 1
k += 1
# A slightly more compact way of calculating the median.
median = [(merged[-1]+merged[-2])/2.0, merged[-1]][(l1+l2)%2]
return median
Edit: Including the suggestion to only walk along the first n/2
steps of the merging cleaned this up a fair bit. Also preserved nums1
and nums2
when padding the lists to be merged.
-
2\$\begingroup\$ You can improve on this, since we only need the middle one or two values in the merged list, so we can throw away the beginning and end. Do you see how that removes the need for extra storage? \$\endgroup\$Toby Speight– Toby Speight2022年10月20日 10:31:39 +00:00Commented Oct 20, 2022 at 10:31
-
7\$\begingroup\$ You don't need a merged list at all; just walk along
n/2
steps then use the next one or two items \$\endgroup\$Jiří Baum– Jiří Baum2022年10月20日 11:37:46 +00:00Commented Oct 20, 2022 at 11:37
The core of this question is not to find the median, it is to merge two arrays efficiently.
If the two arrays are already sorted, just use the merge sort and find the median, the time complexity here will be O(n).
For unsorted arrays, the algorithm you use should be case-by-case. There is no perfect sorting algorithm. For example, if the elements inside the arrays are not big, you can use count sort, it will be O(n).
Ps: the sort()
function in python is implemented with some black magic, so I only talk about general cases.
-
1\$\begingroup\$ I disagree with the first sentence - we can find the median without merging arrays, and that's likely to scale better, due to the reduced memory overheads. \$\endgroup\$Toby Speight– Toby Speight2022年10月21日 06:37:47 +00:00Commented Oct 21, 2022 at 6:37
-
\$\begingroup\$ BTW, I didn't downvote, as I think the rest of the answer is useful. It would be nice if the person who did would have been helpful enough to add a comment saying what could be improved. \$\endgroup\$Toby Speight– Toby Speight2022年10月21日 08:33:59 +00:00Commented Oct 21, 2022 at 8:33
-
\$\begingroup\$ I don't think there is a way to find the median without sorting the arrays. Do you have any idea? \$\endgroup\$Allonsy Jia– Allonsy Jia2022年10月21日 19:55:09 +00:00Commented Oct 21, 2022 at 19:55
-
\$\begingroup\$ See Andrew's answer for hints on algorithms that avoid combining the arrays. I think he's presented that more clearly than I could. \$\endgroup\$Toby Speight– Toby Speight2022年10月22日 08:00:52 +00:00Commented Oct 22, 2022 at 8:00
Explore related questions
See similar questions with these tags.