Calculate the median value of combining two sorted lists

Question 1

I have solved a leetcode problem (Median of Two Sorted Arrays), and I came up with two solutions.

Solution 1

def findMedianSortedArrays(nums1, nums2) -> float:
 nums1.extend(nums2)
 nums1.sort()
 median = 0
 num_len = len(nums1)
 while((num_len % 2) == 0):
 median = (nums1[int((num_len - 1)/2)] + nums1[int(num_len/2)]) / 2
 median = nums1[int(num_len/2)]
 return median

Solution 2

import statistics
def findMedianSortedArrays(nums1, nums2) -> float:
 nums1.extend(nums2)
 nums1.sort()
 median = 0
 median = statistics.median(nums1)
 return median

How can I improve the speed?

Question 2

How does the first solution ever terminate when the combined list has an even length?

Question 3

You should be able to do this without modifying the inputs, and without copying either array.

Question 4

You should also be able to do this without re-sorting already sorted arrays.

Question 5

Assuming that nums1 and nums2 are each already sorted,

nums1.extend(nums2)
nums1.sort()

fails to take advantage of that fact. nums1.sort() could take O(n log n) time, when it could be as fast as O(n) if you merge the two lists manually.

Furthermore, modifying one of the input parameters, especially without clearly documenting it, is unclean programming practice.

In both implementations, median = 0 is a superfluous assignment.

Question 6

nums1 & nums2 were not sorted so that's why it was sorted.

Question 7

All I had to go on was the name of the task ("Median of Two Sorted Arrays") and the function name (findMedianSortedArrays). If it's not the case that the input arrays are already sorted, then my critique is that your function name is misleading.

Question 8

The two sorted arrays are sorted separately so when combing them they would not be sorted and hence need to be sorted.

Question 9

Read my answer carefully. I'm saying that you should be able to take advantage of the fact that the inputs are already individually sorted.

Question 10

You don't need to merge or sort the arrays. But it all depends on your goal. Assume the lengths of the lists are N and M. As you are leet code, the "correct" answer is likely the complex solution, but in the real world, the simple and balanced solution is what I would reach for 90% of the time.

Simple

If you want a "reasonable" solution and move on with your life, merge then sort then take the middle is clear, maintainable, and fast "enough" at O((N+M) log(N+M)).

Balanced

If you care a bit more about performance, you can walk along the two arrays simultaneously, advancing in whichever has the smaller number. When you have consumed half the numbers you are at the median. This is more complex, but doesn't require allocations and is O(N+M) compute time.

Complex

If you really care about performance and are prepared to write complex code. You can actually be even faster and get down to O(log(N+M)). This will involve playing with the definition of median, thinking what it looks like to have identified the median elements, and then doing something akin to a binary search. Hint, i+j == (N+M)/2, roughly, depending on parity of N+M.

This is more complex, it's harder to maintain, and much more time to work out. But if you need to do this repeatedly, the better asymptotic behaviour (the O() function) might be worth it.

Question 11

I agree with @200_success that sorting the lists manually would yield faster results. That being said, here's a quick implementation of what that might look like:

def findMedianSortedArrays(nums1, nums2):
 l1 = len(nums1)
 l2 = len(nums2)
 l3 = 1 + (l1 + l2) / 2
 # Pad the input with inf to account for edge cases.
 # You can technically use try/catch instead if you reeeeeally want
 # to be space efficient, but this just feels nicer to me.
 first = nums1 + [float('inf')]*(l3-l1)
 second = nums2 + [float('inf')]*(l3-l2)
 merged = [None]*l3
 # Build the first half of the merged list.
 # This will be enough to calculate the median.
 i = j = k = 0
 while k < l3:
 if first[i] < second[j]:
 merged[k] = first[i]
 i += 1
 else:
 merged[k] = second[j]
 j += 1
 k += 1
 # A slightly more compact way of calculating the median.
 median = [(merged[-1]+merged[-2])/2.0, merged[-1]][(l1+l2)%2]
 return median

Edit: Including the suggestion to only walk along the first n/2 steps of the merging cleaned this up a fair bit. Also preserved nums1 and nums2 when padding the lists to be merged.

Question 12

You can improve on this, since we only need the middle one or two values in the merged list, so we can throw away the beginning and end. Do you see how that removes the need for extra storage?

Question 13

You don't need a merged list at all; just walk along n/2 steps then use the next one or two items

Question 14

The core of this question is not to find the median, it is to merge two arrays efficiently.

If the two arrays are already sorted, just use the merge sort and find the median, the time complexity here will be O(n).

For unsorted arrays, the algorithm you use should be case-by-case. There is no perfect sorting algorithm. For example, if the elements inside the arrays are not big, you can use count sort, it will be O(n).

Ps: the sort() function in python is implemented with some black magic, so I only talk about general cases.

Question 15

I disagree with the first sentence - we can find the median without merging arrays, and that's likely to scale better, due to the reduced memory overheads.

Question 16

BTW, I didn't downvote, as I think the rest of the answer is useful. It would be nice if the person who did would have been helpful enough to add a comment saying what could be improved.

Question 17

I don't think there is a way to find the median without sorting the arrays. Do you have any idea?

Question 18

See Andrew's answer for hints on algorithms that avoid combining the arrays. I think he's presented that more clearly than I could.

200_success 200_success 145k22 gold badges190 silver badges478 bronze badges · Answer 1 · 2022-10-19 22:44:59Z

9

\$\begingroup\$

Assuming that nums1 and nums2 are each already sorted,

nums1.extend(nums2)
nums1.sort()

fails to take advantage of that fact. nums1.sort() could take O(n log n) time, when it could be as fast as O(n) if you merge the two lists manually.

Furthermore, modifying one of the input parameters, especially without clearly documenting it, is unclean programming practice.

In both implementations, median = 0 is a superfluous assignment.

Share

answered Oct 19, 2022 at 22:44

200_success's user avatar

200_success 200_success

145k22 gold badges190 silver badges478 bronze badges

\$\endgroup\$

4

\$\begingroup\$ nums1 & nums2 were not sorted so that's why it was sorted. \$\endgroup\$

111
– 111

2022年10月20日 16:45:48 +00:00
Commented Oct 20, 2022 at 16:45
2

\$\begingroup\$ All I had to go on was the name of the task ("Median of Two Sorted Arrays") and the function name (findMedianSortedArrays). If it's not the case that the input arrays are already sorted, then my critique is that your function name is misleading. \$\endgroup\$

200_success
– 200_success

2022年10月20日 17:08:17 +00:00
Commented Oct 20, 2022 at 17:08
\$\begingroup\$ The two sorted arrays are sorted separately so when combing them they would not be sorted and hence need to be sorted. \$\endgroup\$

111
– 111

2022年10月20日 17:45:25 +00:00
Commented Oct 20, 2022 at 17:45
3

\$\begingroup\$ Read my answer carefully. I'm saying that you should be able to take advantage of the fact that the inputs are already individually sorted. \$\endgroup\$

200_success
– 200_success

2022年10月20日 18:10:46 +00:00
Commented Oct 20, 2022 at 18:10

Add a comment |

Andrew Cramer Andrew Cramer 312 bronze badges · Answer 2 · 2022-10-21 00:43:32Z

You don't need to merge or sort the arrays. But it all depends on your goal. Assume the lengths of the lists are N and M. As you are leet code, the "correct" answer is likely the complex solution, but in the real world, the simple and balanced solution is what I would reach for 90% of the time.

Simple

If you want a "reasonable" solution and move on with your life, merge then sort then take the middle is clear, maintainable, and fast "enough" at O((N+M) log(N+M)).

Balanced

If you care a bit more about performance, you can walk along the two arrays simultaneously, advancing in whichever has the smaller number. When you have consumed half the numbers you are at the median. This is more complex, but doesn't require allocations and is O(N+M) compute time.

Complex

If you really care about performance and are prepared to write complex code. You can actually be even faster and get down to O(log(N+M)). This will involve playing with the definition of median, thinking what it looks like to have identified the median elements, and then doing something akin to a binary search. Hint, i+j == (N+M)/2, roughly, depending on parity of N+M.

This is more complex, it's harder to maintain, and much more time to work out. But if you need to do this repeatedly, the better asymptotic behaviour (the O() function) might be worth it.

GotCubes GotCubes 112 bronze badges · Answer 3 · 2022-10-20 10:13:08Z

I agree with @200_success that sorting the lists manually would yield faster results. That being said, here's a quick implementation of what that might look like:

def findMedianSortedArrays(nums1, nums2):
 l1 = len(nums1)
 l2 = len(nums2)
 l3 = 1 + (l1 + l2) / 2
 # Pad the input with inf to account for edge cases.
 # You can technically use try/catch instead if you reeeeeally want
 # to be space efficient, but this just feels nicer to me.
 first = nums1 + [float('inf')]*(l3-l1)
 second = nums2 + [float('inf')]*(l3-l2)
 merged = [None]*l3
 # Build the first half of the merged list.
 # This will be enough to calculate the median.
 i = j = k = 0
 while k < l3:
 if first[i] < second[j]:
 merged[k] = first[i]
 i += 1
 else:
 merged[k] = second[j]
 j += 1
 k += 1
 # A slightly more compact way of calculating the median.
 median = [(merged[-1]+merged[-2])/2.0, merged[-1]][(l1+l2)%2]
 return median

Edit: Including the suggestion to only walk along the first n/2 steps of the merging cleaned this up a fair bit. Also preserved nums1 and nums2 when padding the lists to be merged.

You can improve on this, since we only need the middle one or two values in the merged list, so we can throw away the beginning and end. Do you see how that removes the need for extra storage?
You don't need a merged list at all; just walk along n/2 steps then use the next one or two items

Allonsy Jia Allonsy Jia 1011 bronze badge · Answer 4 · 2022-10-20 20:45:21Z

-1

\$\begingroup\$

The core of this question is not to find the median, it is to merge two arrays efficiently.

If the two arrays are already sorted, just use the merge sort and find the median, the time complexity here will be O(n).

For unsorted arrays, the algorithm you use should be case-by-case. There is no perfect sorting algorithm. For example, if the elements inside the arrays are not big, you can use count sort, it will be O(n).

Ps: the sort() function in python is implemented with some black magic, so I only talk about general cases.

Share

answered Oct 20, 2022 at 20:45

Allonsy Jia's user avatar

Allonsy Jia Allonsy Jia

1011 bronze badge

\$\endgroup\$

4

1

\$\begingroup\$ I disagree with the first sentence - we can find the median without merging arrays, and that's likely to scale better, due to the reduced memory overheads. \$\endgroup\$

Toby Speight
– Toby Speight

2022年10月21日 06:37:47 +00:00
Commented Oct 21, 2022 at 6:37
\$\begingroup\$ BTW, I didn't downvote, as I think the rest of the answer is useful. It would be nice if the person who did would have been helpful enough to add a comment saying what could be improved. \$\endgroup\$

Toby Speight
– Toby Speight

2022年10月21日 08:33:59 +00:00
Commented Oct 21, 2022 at 8:33
\$\begingroup\$ I don't think there is a way to find the median without sorting the arrays. Do you have any idea? \$\endgroup\$

Allonsy Jia
– Allonsy Jia

2022年10月21日 19:55:09 +00:00
Commented Oct 21, 2022 at 19:55
\$\begingroup\$ See Andrew's answer for hints on algorithms that avoid combining the arrays. I think he's presented that more clearly than I could. \$\endgroup\$

Toby Speight
– Toby Speight

2022年10月22日 08:00:52 +00:00
Commented Oct 22, 2022 at 8:00

Add a comment |

Stack Exchange Network

Calculate the median value of combining two sorted lists

Solution 1

Solution 2

4 Answers 4

Simple

Balanced

Complex

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Calculate the median value of combining two sorted lists

Solution 1

Solution 2

4 Answers 4

Simple

Balanced

Complex

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions