I am solving interview questions from here.
Problem : Given a N cross M matrix in which each row is sorted, find the overall median of the matrix. Assume N*M is odd.
Note: No extra memory is allowed. For example:
Matrix= [1, 3, 5] [2, 6, 9] [3, 6, 9]
A = [1, 2, 3, 3, 5, 6, 6, 9, 9]
Median is 5. So, output is 5
.
This is my approach:
def find_median( A):
"""Returns the median value from given list"""
for i in range(1,len(A)):
A[0].extend(A[i])
return (sorted(A[0])).pop(len(A[0])/2)
Test Cases:
assert find_median([[1,3,5],[2,5,9],[3,6,11]]) == 5
assert find_median([[0,1,1],[2,6,10],[3,5,9]]) == 3
assert find_median([[1,3,4,12,14],[1,6,9,10,15],[0,1,3,3,4]]) == 4
I am able to solve the problem but I wanted to know is there a better approach to solve this problem?
3 Answers 3
Sorting the contents of the matrix and then picking the index with the median value is a good approach. Lets see if we can do it with constant extra memory.
for i in range(1,len(A)):
A[0].extend(A[i])
This extends the first row of the matrix to contain every row in a flat list. Before the matrix was of size N * M, whereas now it is N * M (the first row + (N - 1) * M (all the other rows). Subtracting the original size from this tells us how much extra memory we are using. We use (N - 1) * M additional memory or in other words O(NM) extra memory. This is not what we want.
The reason to put all the elements in one list is to make sorting easy. Lets see if we can sort without needing a flatten (1d) list. There are many sorts that don't require extra memory, they are called "inplace" sorting algorithms. For simplicity we will modify selection sort to work for our case.
How selection sort works is it picks the smallest element in the list, and puts it at the front. Then it finds the next smallest element, and puts it second, and so forth. To implement this, we can find the smallest in the whole list, and swap it with the first element. Then we can find the smallest of the list skipping the first slot.
def index_of_smallest(numbers, starting_index):
# Assume numbers is not empty.
smallest, index = numbers[starting_index], starting_index
for i, number in enumerate(numbers[starting_index:], starting_index):
if number < smallest:
smallest, index = number, i
return index
def selection_sort(numbers):
size = len(numbers)
for i in range(size):
index = index_of_smallest(numbers, i)
numbers[i], numbers[index] = numbers[index], numbers[i]
# Don't return anything, we are modifying it inplace.
Now, we need this process to work on a matrix instead of a flat list. This is straightforward enough, we can loop over the matrix (left to right, top to bottom) and ignore cells we have already dealt with. In the below code x is the row coordinate, and y is the column coordinate.
def coordinates_of_smallest(matrix, starting_x, starting_y):
smallest, smallest_x, smallest_y = matrix[starting_x][starting_y], starting_x, starting_y
for x, row in enumerate(matrix):
for y, cell in enumerate(row):
if x < starting_x or (x == starting_x and y < starting_y):
continue
if cell < smallest:
smallest, smallest_x, smallest_y = cell, x, y
return smallest_x, smallest_y
def selection_sort(matrix):
# Assume the matrix is not empty.
n, m = len(matrix), len(matrix[0])
for x in range(n):
for y in range(m):
smallest_x, smallest_y = coordinates_of_smallest(matrix, x, y)
matrix[x][y], matrix[smallest_x][smallest_y] = matrix[smallest_x][smallest_y], matrix[x][y]
>>> matrix = [[1, 3, 5], [2, 6, 9], [3, 6, 9]]
>>> selection_sort(matrix)
>>> print(matrix) # [[1, 2, 3], [3, 5, 6], [6, 9, 9]]
Now getting the median of this is a piece of cake, it will be in the middle slot of the middle row! Since N * M is odd, both N and M must be odd. Therefore the median is at matrix[N // 2][M // 2].
There is a little room for improvement here. While we only use constant extra memory, our time complexity has gone up from O(nm lognm) to O((nm)**2). For a better time complexity, I would recommend using inplace quicksort which brings us back to O(nm lognm).
Another point is that we are doing too much work. Once we have worked our way up to the row N // 2 and the slot M // 2, we are actually done! We have put the median element in it's place, and we can stop. This is a simple enough check to add, but can cut the actual running time of the code in half.
-
\$\begingroup\$ this makes no use of the fact that the numbers in the row are in order, and the in-place sorting is unnecessary \$\endgroup\$Maarten Fabré– Maarten Fabré2018年05月28日 08:38:34 +00:00Commented May 28, 2018 at 8:38
-
\$\begingroup\$ @MaartenFabré The inplace sorting is critical for it to use constant additional memory. That's the number one priority, not time complexity. Yes it it not necessary to do a full sort, but I don't think the additional code complexity is worth it. Also yes finding the smallest element can be done quicker by taking into account the rows are sorted, but again it is additional code complexity for something that isn't a priority. \$\endgroup\$spyr03– spyr032018年05月29日 13:00:02 +00:00Commented May 29, 2018 at 13:00
Follow PEP8
A
is a bad variable name, use saymatrix
.- You should remove the space in-front of the function argument.
- You should put spaces after
,
. - You don't need the
()
surroundingsorted
. - You could add some space around your division.
You can use
//
rather than/
to make your code Python 2 and Python 3 compatable.- You don't need to use
pop
, normal indexing will work too.
def find_median(matrix):
"""Returns the median value from given matrix"""
for i in range(1, len(matrix)):
matrix[0].extend(matrix[i])
return sorted(matrix[0])[len(matrix[0]) // 2]
Your code doesn't work as the challenge asks, if you add print(matrix)
before return
you'll see:
[[1, 3, 5, 2, 6, 9, 3, 6, 9], [2, 6, 9], [3, 6, 9]]
-
\$\begingroup\$ I think you could be a little clearer as to why the original code doesn't work, mentioning "you've used extra space" would be a clearer. \$\endgroup\$spyr03– spyr032018年05月25日 14:39:43 +00:00Commented May 25, 2018 at 14:39
Making one list
Name your variables correctly. When you look back at your code in a few months, you will have to look for a few minutes to figure out what you did. You'll have to figure out that A[0]
is the list with all the values of the rows appended, and that len(A[0])/2
is the index of the median.
PS. this code will fail in python 3. If you really need floor division, use //
, which is clear in both Python 2 and 3.
Instead of
for i in range(1,len(A)):
A[0].extend(A[i])
at least you can do
all_elements = []
for row in A:
all_elements.extend(row)
or even better, use itertools.chain.from_iterable
:
from itertools import chain
all_elements = chain.from_iterable(A)
median_index = len(A) * len(A[0]) // 2
return sorted(all_elements)[median_index]
Alternative approach
In your solution, you'll have 3 copies of the whole matrix (+ whatever sorted
uses internally):
A[0]
contains a copy of each element of the matrix because OP appends them all there.- The rest of the rows
A
also still exist, so they also contain an extra copy of each element (apart from the first row). sorted
generates a third list.
By using chain
you eliminate the first, so you still remain with 2 copies.
The easiest way to do this without copying the whole matrix in a sorted list, is to use a sorted queue of iterators of the different rows, sorted by the next value, and pop and reinsert on this queue until you have the median. I use bisect.insort_left
for the insertion in order
from bisect import insort_left
def find_median(matrix):
"""
finds the median in a matrix with sorted rows
"""
median_index = len(matrix) * len(matrix[0]) // 2
iterators = map(iter, matrix)
iterators = deque(sorted((next(it), row, it) for (row, it) in enumerate(iterators)))
idx = 0
while idx <= median_index:
value, row, iterator = iterators.popleft()
try:
item = next(iterator), row, iterator
insort_left(iterators, item)
except StopIteration:
pass
idx += 1
# print(value, idx)
return value
The deque consumes some extra memory, but only O(N) instead of O(NM). This can also be done using a list of length N with the index of the iteration, doing the iteration over the different rows yourself.
The row is added to the item as tiebreaker when there are multiple rows with the same value because iterators are not sortable.
Standard library
I found out that heapq.merge
does the same as what I do with the deque of iterators, so this works too:
from heapq import merge
from itertools import islice
def find_median_heapq(matrix):
median_index = len(matrix) * len(matrix[0]) // 2
all_items = merge(*matrix)
return next(islice(all_items, median_index, None))
-
\$\begingroup\$
A[0]
doesn't copy, what do you mean by it? "The originalA
,A[0]
and" \$\endgroup\$2018年05月25日 12:09:37 +00:00Commented May 25, 2018 at 12:09 -
\$\begingroup\$
A[0]
contains a copy of each element of the matrix because OP appends them all there. The rest of the rows A also still exist, so they also contain an extra copy of each element (apart from the first row), andsorted
generates a third list \$\endgroup\$Maarten Fabré– Maarten Fabré2018年05月25日 12:18:19 +00:00Commented May 25, 2018 at 12:18 -
\$\begingroup\$ Please can you clarify that in your answer, as it currently it reads as indexing a list returns a copy. \$\endgroup\$2018年05月25日 12:20:17 +00:00Commented May 25, 2018 at 12:20
-
\$\begingroup\$ Shouldn't "1st" be used instead of "1nd"? \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2018年05月25日 13:30:47 +00:00Commented May 25, 2018 at 13:30
Explore related questions
See similar questions with these tags.
A[0].extend(A[i])
have to allocate \$\Theta(MN)\$ extra memory in order to extend the list. If you're having trouble telling how much extra memory you are using, it might help to use the__sizeof__
method to determine how much memory a particular object is using, for exampleA[0].__sizeof__()
tells you the memory used by the listA[0]
in bytes. \$\endgroup\$