Median of two sorted arrays in Python

Question 1

Problem Statement

^{(Source: Leetcode Problem 4: Median of Two Sorted Arrays [Hard])
(Topics: [Array] [Binary Search] [Divide and Conquer])}

Given two sorted arrays nums1 and nums2 of size m and n respectively, return the median of the two sorted arrays.

The overall time complexity should be O(log(m+n)).

I was able to solve this problem with the following code:

def findMedianSortedArrays(self, nums1, nums2):
 nums = nums1 + nums2
 nums.sort()
 if len(nums) % 2 == 0:
 return (nums[(len(nums) / 2) - 1] + nums[len(nums) / 2]) / 2.0
 else:
 return nums[len(nums) // 2]

which has a time complexity of \$O((m+n)\log(m+n))\$ and space complexity of \$O(m+n)\$.

Admittedly while I didn't find the problem too difficult overall, I do want to still try to improve the time complexity of my code to at least \$O(\log(m+n))\$.

My question is:

What am I misunderstanding about the approach needed to make the code run in \$O(\log(m+n))\$ time?

Edit

Per toolic's suggestion, I have modified the code to run on newer versions of Python (of course also using snake_case to write function names for readability). I used this online Python editor to test my fixes:

def find_median_two_sorted_arrays(nums1: list, nums2: list):
 nums = nums1 + nums2
 nums.sort
 if len(nums) % 2 == 0:
 return (nums[(len(nums) // 2) - 1] + nums[len(nums) // 2]) / 2.0
 else:
 return nums[len(nums) // 2]

^{Note: I would like to say that I know (after doing some research) that this can be done in \$O(\log(\min(n,m)))\$ time complexity with \$O(1)\$ space complexity. However, this is not what I want as this is admittedly very out of my reach with my understanding of Python.}

Question 2

A O(log(min(m, n))) approach has been mentioned here: codereview.stackexchange.com/a/202358/35991.

Question 3

@TobySpeight The time-limit-exceeded tag states "You may use this tag instead of performance when an online judge rejects your solution to a programming-challenge due to time constraints". We explicitly allow code which doesn't scale.

Question 4

The crucial point of the task is finding a median of two arrays, not a single array obtained by joining the two.

Question 5

mentioned that \$O(\log \min(n,m))\$ would be really outside of what I can understand with my current knowledge.

Nah! It's really about the sizes \$n, m\$, rather than about the algorithm. If one is of comparable magnitude to the other, then \$O(\min(n, m))\$ is simply \$O(n)\$, a case that's easily dealt with. For the \$\min\$ to be interesting, one of these must hold:

\$n \ll m\$, or
\$n \gg m\$

That's saying that one of the input lists is essentially "empty" for purposes of Big-Oh analysis, and won't significantly contribute to the overall time.

Take e.g. the \$n \ll m\$ case. Since \$n\$ is so small, we could merge its entries into the larger list with negligible cost. Or just use a \$O(\log m+n))\$ solution, confident that that's really \$O(\log m)\$.

improve the time complexity of my code to at least \$O(\log m+n))\$.

Your code uses a linear time expression, nums1 + nums2. You need to be able to talk about "the midpoint of the combined arrays" without ever touching the majority of array elements. A datastructure like this will assist with that:

class SlicedList:
 """Models a slice of a list using start and end index, with zero copies."""
 def __init__(
 self, nums: list[float], start: int = 0, end: int | None = None
 ) -> None:
 self.nums = nums
 self.start = start
 self.end = len(nums) if end is None else end
 def slice(self, start: int, end: int | None = None) -> "SlicedList":
 length = self.end - self.start
 end_i: int = length if end is None else end
 assert 0 <= start <= end_i <= length
 return SlicedList(self.nums, self.start + start, self.start + end_i)
 def __getitem__(self, i: int) -> float:
 return self.nums[self.start + i]
 def __len__(self) -> int:
 return self.end - self.start

And then this will help with the binary searching.

def kth_idx(a: SlicedList, b: SlicedList, k: int) -> tuple[int, int]: # noqa PLR0911
 """Given two sorted lists of numbers, return (list_id, idx) in O(log n) time.
 We are concerned with the kth element of the merged lists a and b.
 list_id: 0 for a, 1 for b, according to which contains the kth sorted value
 idx: index of the kth element in either list a or b
 """
 assert 0 <= k < len(a) + len(b)
 if not a:
 return 1, k
 if not b:
 return 0, k
 if k == 0:
 return int(a[0] > b[0]), k
 # binary search
 ia, ib = len(a) // 2, len(b) // 2
 if ia + ib < k:
 if a[ia] > b[ib]:
 return kth_idx(a, b.slice(ib + 1), k - ib - 1)
 return kth_idx(a.slice(ia + 1), b, k - ia - 1)
 if a[ia] > b[ib]:
 return kth_idx(a.slice(0, ia), b, k)
 return kth_idx(a, b.slice(0, ib), k)

Question 6

And nums.sort() is even worse than nums1+nums2.

Question 7

The previous answer addresses your main concern about complexity. This will address one of the now-deleted comments on the question regarding functionality.

Portability

The function that was posted must be part of a class which was not posted. The code can't be run as-is due to the presence of the self keyword. I removed self and added an example function call:

def findMedianSortedArrays(nums1, nums2):
 nums = nums1 + nums2
 nums.sort
 if len(nums) % 2 == 0:
 return (nums[(len(nums) / 2) - 1] + nums[len(nums) / 2]) / 2.0
 else:
 return nums[len(nums) // 2] 
print(findMedianSortedArrays([6,4,5], [2,3,1]))

When I run the code on Python version 2.7, I get the expected answer:

3.5

However, when I run the code on Python version 3.7, I get an error:

TypeError: list indices must be integers or slices, not float

Since Python 2.7 is deprecated, I suggest modifying the code to run on newer 3.x versions as well.

Naming

The PEP 8 style guide recommends using snake_case for function names:

find_median_sorted_arrays

J_H J_H 41.7k3 gold badges38 silver badges157 bronze badges · Accepted Answer · 2024-12-06 18:20:06Z

mentioned that \$O(\log \min(n,m))\$ would be really outside of what I can understand with my current knowledge.

Nah! It's really about the sizes \$n, m\$, rather than about the algorithm. If one is of comparable magnitude to the other, then \$O(\min(n, m))\$ is simply \$O(n)\$, a case that's easily dealt with. For the \$\min\$ to be interesting, one of these must hold:

\$n \ll m\$, or
\$n \gg m\$

That's saying that one of the input lists is essentially "empty" for purposes of Big-Oh analysis, and won't significantly contribute to the overall time.

Take e.g. the \$n \ll m\$ case. Since \$n\$ is so small, we could merge its entries into the larger list with negligible cost. Or just use a \$O(\log m+n))\$ solution, confident that that's really \$O(\log m)\$.

improve the time complexity of my code to at least \$O(\log m+n))\$.

Your code uses a linear time expression, nums1 + nums2. You need to be able to talk about "the midpoint of the combined arrays" without ever touching the majority of array elements. A datastructure like this will assist with that:

class SlicedList:
 """Models a slice of a list using start and end index, with zero copies."""
 def __init__(
 self, nums: list[float], start: int = 0, end: int | None = None
 ) -> None:
 self.nums = nums
 self.start = start
 self.end = len(nums) if end is None else end
 def slice(self, start: int, end: int | None = None) -> "SlicedList":
 length = self.end - self.start
 end_i: int = length if end is None else end
 assert 0 <= start <= end_i <= length
 return SlicedList(self.nums, self.start + start, self.start + end_i)
 def __getitem__(self, i: int) -> float:
 return self.nums[self.start + i]
 def __len__(self) -> int:
 return self.end - self.start

And then this will help with the binary searching.

def kth_idx(a: SlicedList, b: SlicedList, k: int) -> tuple[int, int]: # noqa PLR0911
 """Given two sorted lists of numbers, return (list_id, idx) in O(log n) time.
 We are concerned with the kth element of the merged lists a and b.
 list_id: 0 for a, 1 for b, according to which contains the kth sorted value
 idx: index of the kth element in either list a or b
 """
 assert 0 <= k < len(a) + len(b)
 if not a:
 return 1, k
 if not b:
 return 0, k
 if k == 0:
 return int(a[0] > b[0]), k
 # binary search
 ia, ib = len(a) // 2, len(b) // 2
 if ia + ib < k:
 if a[ia] > b[ib]:
 return kth_idx(a, b.slice(ib + 1), k - ib - 1)
 return kth_idx(a.slice(ia + 1), b, k - ia - 1)
 if a[ia] > b[ib]:
 return kth_idx(a.slice(0, ia), b, k)
 return kth_idx(a, b.slice(0, ib), k)

\$\begingroup\$ And nums.sort() is even worse than nums1+nums2. \$\endgroup\$

Teepeemm
– Teepeemm

2024年12月12日 23:54:58 +00:00
Commented Dec 12, 2024 at 23:54

Stack Exchange Network

Median of two sorted arrays in Python

Problem Statement

Edit

2 Answers 2

Portability

Naming

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Median of two sorted arrays in Python

Problem Statement

Edit

2 Answers 2

Portability

Naming

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions