Problem Statement
(Source: Leetcode Problem 4: Median of Two Sorted Arrays [Hard])
(Topics: [Array] [Binary Search] [Divide and Conquer])
Given two sorted arrays
nums1
andnums2
of sizem
andn
respectively, return the median of the two sorted arrays.The overall time complexity should be
O(log(m+n))
.
I was able to solve this problem with the following code:
def findMedianSortedArrays(self, nums1, nums2):
nums = nums1 + nums2
nums.sort()
if len(nums) % 2 == 0:
return (nums[(len(nums) / 2) - 1] + nums[len(nums) / 2]) / 2.0
else:
return nums[len(nums) // 2]
which has a time complexity of \$O((m+n)\log(m+n))\$ and space complexity of \$O(m+n)\$.
Admittedly while I didn't find the problem too difficult overall, I do want to still try to improve the time complexity of my code to at least \$O(\log(m+n))\$.
My question is:
What am I misunderstanding about the approach needed to make the code run in \$O(\log(m+n))\$ time?
Edit
Per toolic's suggestion, I have modified the code to run on newer versions of Python (of course also using snake_case
to write function names for readability). I used this online Python editor to test my fixes:
def find_median_two_sorted_arrays(nums1: list, nums2: list):
nums = nums1 + nums2
nums.sort
if len(nums) % 2 == 0:
return (nums[(len(nums) // 2) - 1] + nums[len(nums) // 2]) / 2.0
else:
return nums[len(nums) // 2]
Note: I would like to say that I know (after doing some research) that this can be done in \$O(\log(\min(n,m)))\$ time complexity with \$O(1)\$ space complexity. However, this is not what I want as this is admittedly very out of my reach with my understanding of Python.
-
2\$\begingroup\$ A O(log(min(m, n))) approach has been mentioned here: codereview.stackexchange.com/a/202358/35991. \$\endgroup\$Martin R– Martin R2024年12月06日 17:07:32 +00:00Commented Dec 6, 2024 at 17:07
-
2\$\begingroup\$ @TobySpeight The time-limit-exceeded tag states "You may use this tag instead of performance when an online judge rejects your solution to a programming-challenge due to time constraints". We explicitly allow code which doesn't scale. \$\endgroup\$Peilonrayz– Peilonrayz ♦2024年12月09日 01:29:06 +00:00Commented Dec 9, 2024 at 1:29
-
\$\begingroup\$ The crucial point of the task is finding a median of two arrays, not a single array obtained by joining the two. \$\endgroup\$CiaPan– CiaPan2024年12月11日 18:37:54 +00:00Commented Dec 11, 2024 at 18:37
2 Answers 2
mentioned that \$O(\log \min(n,m))\$ would be really outside of what I can understand with my current knowledge.
Nah! It's really about the sizes \$n, m\$, rather than about the algorithm. If one is of comparable magnitude to the other, then \$O(\min(n, m))\$ is simply \$O(n)\$, a case that's easily dealt with. For the \$\min\$ to be interesting, one of these must hold:
- \$n \ll m\$, or
- \$n \gg m\$
That's saying that one of the input lists is essentially "empty" for purposes of Big-Oh analysis, and won't significantly contribute to the overall time.
Take e.g. the \$n \ll m\$ case. Since \$n\$ is so small, we could merge its entries into the larger list with negligible cost. Or just use a \$O(\log m+n))\$ solution, confident that that's really \$O(\log m)\$.
improve the time complexity of my code to at least \$O(\log m+n))\$.
Your code uses a linear time expression, nums1 + nums2
.
You need to be able to talk about "the midpoint of the combined arrays"
without ever touching the majority of array elements.
A datastructure like this will assist with that:
class SlicedList:
"""Models a slice of a list using start and end index, with zero copies."""
def __init__(
self, nums: list[float], start: int = 0, end: int | None = None
) -> None:
self.nums = nums
self.start = start
self.end = len(nums) if end is None else end
def slice(self, start: int, end: int | None = None) -> "SlicedList":
length = self.end - self.start
end_i: int = length if end is None else end
assert 0 <= start <= end_i <= length
return SlicedList(self.nums, self.start + start, self.start + end_i)
def __getitem__(self, i: int) -> float:
return self.nums[self.start + i]
def __len__(self) -> int:
return self.end - self.start
And then this will help with the binary searching.
def kth_idx(a: SlicedList, b: SlicedList, k: int) -> tuple[int, int]: # noqa PLR0911
"""Given two sorted lists of numbers, return (list_id, idx) in O(log n) time.
We are concerned with the kth element of the merged lists a and b.
list_id: 0 for a, 1 for b, according to which contains the kth sorted value
idx: index of the kth element in either list a or b
"""
assert 0 <= k < len(a) + len(b)
if not a:
return 1, k
if not b:
return 0, k
if k == 0:
return int(a[0] > b[0]), k
# binary search
ia, ib = len(a) // 2, len(b) // 2
if ia + ib < k:
if a[ia] > b[ib]:
return kth_idx(a, b.slice(ib + 1), k - ib - 1)
return kth_idx(a.slice(ia + 1), b, k - ia - 1)
if a[ia] > b[ib]:
return kth_idx(a.slice(0, ia), b, k)
return kth_idx(a, b.slice(0, ib), k)
-
\$\begingroup\$ And
nums.sort()
is even worse thannums1+nums2
. \$\endgroup\$Teepeemm– Teepeemm2024年12月12日 23:54:58 +00:00Commented Dec 12, 2024 at 23:54
The previous answer addresses your main concern about complexity. This will address one of the now-deleted comments on the question regarding functionality.
Portability
The function that was posted must be part of a class which was not posted. The code can't be run as-is due to the presence of the self
keyword. I removed self
and added an example function call:
def findMedianSortedArrays(nums1, nums2):
nums = nums1 + nums2
nums.sort
if len(nums) % 2 == 0:
return (nums[(len(nums) / 2) - 1] + nums[len(nums) / 2]) / 2.0
else:
return nums[len(nums) // 2]
print(findMedianSortedArrays([6,4,5], [2,3,1]))
When I run the code on Python version 2.7, I get the expected answer:
3.5
However, when I run the code on Python version 3.7, I get an error:
TypeError: list indices must be integers or slices, not float
Since Python 2.7 is deprecated, I suggest modifying the code to run on newer 3.x versions as well.
Naming
The PEP 8 style guide recommends using snake_case for function names:
find_median_sorted_arrays
Explore related questions
See similar questions with these tags.