Calculate median of a matrix

Question 1

I am solving interview questions from here.

Problem : Given a N cross M matrix in which each row is sorted, find the overall median of the matrix. Assume N*M is odd.

Note: No extra memory is allowed. For example:
 Matrix= [1, 3, 5]
 [2, 6, 9]
 [3, 6, 9]
A = [1, 2, 3, 3, 5, 6, 6, 9, 9] Median is 5. So, output is 5.

This is my approach:

def find_median( A):
 """Returns the median value from given list"""
 for i in range(1,len(A)):
 A[0].extend(A[i])
 return (sorted(A[0])).pop(len(A[0])/2)

Test Cases:

assert find_median([[1,3,5],[2,5,9],[3,6,11]]) == 5
assert find_median([[0,1,1],[2,6,10],[3,5,9]]) == 3
assert find_median([[1,3,4,12,14],[1,6,9,10,15],[0,1,3,3,4]]) == 4

I am able to solve the problem but I wanted to know is there a better approach to solve this problem?

Question 2

Presumably by "no extra memory is allowed" they actually mean "\$O(1)\$ extra memory is allowed" (otherwise there would be no solution). But the code in the post uses \$\Theta(NM)\$ extra memory and so does not solve the problem.

Question 3

@Gareth Raees Updated code s.t. constraint "no extra memory allocation required"

Question 4

It's not there yet: the calls to A[0].extend(A[i]) have to allocate \$\Theta(MN)\$ extra memory in order to extend the list. If you're having trouble telling how much extra memory you are using, it might help to use the __sizeof__ method to determine how much memory a particular object is using, for example A[0].__sizeof__() tells you the memory used by the list A[0] in bytes.

Question 5

I'm pretty sure this can be solved without allocating any memory on the heap. You will need some allocation to keep track of some counters (namely \$O(n)\$). You know on which index the median will be (because you know the dimensions of the matrix). Since all rows are sorted, you can make use of a merge-sort-like iteration of the rows to get to that index. You'll need a "current" counter for each row, though.

Question 6

I have been educated in chat. The solution is not what you have here, but it's not pretty on the time complexity level :/

Question 7

Sorting the contents of the matrix and then picking the index with the median value is a good approach. Lets see if we can do it with constant extra memory.

for i in range(1,len(A)):
 A[0].extend(A[i])

This extends the first row of the matrix to contain every row in a flat list. Before the matrix was of size N * M, whereas now it is N * M (the first row + (N - 1) * M (all the other rows). Subtracting the original size from this tells us how much extra memory we are using. We use (N - 1) * M additional memory or in other words O(NM) extra memory. This is not what we want.

The reason to put all the elements in one list is to make sorting easy. Lets see if we can sort without needing a flatten (1d) list. There are many sorts that don't require extra memory, they are called "inplace" sorting algorithms. For simplicity we will modify selection sort to work for our case.

How selection sort works is it picks the smallest element in the list, and puts it at the front. Then it finds the next smallest element, and puts it second, and so forth. To implement this, we can find the smallest in the whole list, and swap it with the first element. Then we can find the smallest of the list skipping the first slot.

def index_of_smallest(numbers, starting_index):
 # Assume numbers is not empty.
 smallest, index = numbers[starting_index], starting_index
 for i, number in enumerate(numbers[starting_index:], starting_index):
 if number < smallest:
 smallest, index = number, i
 return index
def selection_sort(numbers):
 size = len(numbers)
 for i in range(size):
 index = index_of_smallest(numbers, i)
 numbers[i], numbers[index] = numbers[index], numbers[i]
 # Don't return anything, we are modifying it inplace.

Now, we need this process to work on a matrix instead of a flat list. This is straightforward enough, we can loop over the matrix (left to right, top to bottom) and ignore cells we have already dealt with. In the below code x is the row coordinate, and y is the column coordinate.

def coordinates_of_smallest(matrix, starting_x, starting_y):
 smallest, smallest_x, smallest_y = matrix[starting_x][starting_y], starting_x, starting_y
 for x, row in enumerate(matrix):
 for y, cell in enumerate(row):
 if x < starting_x or (x == starting_x and y < starting_y):
 continue
 if cell < smallest:
 smallest, smallest_x, smallest_y = cell, x, y
 return smallest_x, smallest_y
def selection_sort(matrix):
 # Assume the matrix is not empty.
 n, m = len(matrix), len(matrix[0])
 for x in range(n):
 for y in range(m):
 smallest_x, smallest_y = coordinates_of_smallest(matrix, x, y)
 matrix[x][y], matrix[smallest_x][smallest_y] = matrix[smallest_x][smallest_y], matrix[x][y]
>>> matrix = [[1, 3, 5], [2, 6, 9], [3, 6, 9]]
>>> selection_sort(matrix)
>>> print(matrix) # [[1, 2, 3], [3, 5, 6], [6, 9, 9]]

Now getting the median of this is a piece of cake, it will be in the middle slot of the middle row! Since N * M is odd, both N and M must be odd. Therefore the median is at matrix[N // 2][M // 2].

There is a little room for improvement here. While we only use constant extra memory, our time complexity has gone up from O(nm lognm) to O((nm)**2). For a better time complexity, I would recommend using inplace quicksort which brings us back to O(nm lognm).

Another point is that we are doing too much work. Once we have worked our way up to the row N // 2 and the slot M // 2, we are actually done! We have put the median element in it's place, and we can stop. This is a simple enough check to add, but can cut the actual running time of the code in half.

Question 8

this makes no use of the fact that the numbers in the row are in order, and the in-place sorting is unnecessary

Question 9

@MaartenFabré The inplace sorting is critical for it to use constant additional memory. That's the number one priority, not time complexity. Yes it it not necessary to do a full sort, but I don't think the additional code complexity is worth it. Also yes finding the smallest element can be done quicker by taking into account the rows are sorted, but again it is additional code complexity for something that isn't a priority.

Question 10

Follow PEP8
- A is a bad variable name, use say matrix.
- You should remove the space in-front of the function argument.
- You should put spaces after ,.
- You don't need the () surrounding sorted.
- You could add some space around your division.
You can use // rather than / to make your code Python 2 and Python 3 compatable.
You don't need to use pop, normal indexing will work too.

def find_median(matrix):
 """Returns the median value from given matrix"""
 for i in range(1, len(matrix)):
 matrix[0].extend(matrix[i])
 return sorted(matrix[0])[len(matrix[0]) // 2]

Your code doesn't work as the challenge asks, if you add print(matrix) before return you'll see:

[[1, 3, 5, 2, 6, 9, 3, 6, 9], [2, 6, 9], [3, 6, 9]]

Question 11

I think you could be a little clearer as to why the original code doesn't work, mentioning "you've used extra space" would be a clearer.

Question 12

Making one list

Name your variables correctly. When you look back at your code in a few months, you will have to look for a few minutes to figure out what you did. You'll have to figure out that A[0] is the list with all the values of the rows appended, and that len(A[0])/2 is the index of the median.

PS. this code will fail in python 3. If you really need floor division, use //, which is clear in both Python 2 and 3.

Instead of

 for i in range(1,len(A)):
 A[0].extend(A[i])

at least you can do

all_elements = []
for row in A:
 all_elements.extend(row)

or even better, use itertools.chain.from_iterable:

from itertools import chain
all_elements = chain.from_iterable(A)
median_index = len(A) * len(A[0]) // 2
return sorted(all_elements)[median_index]

Alternative approach

In your solution, you'll have 3 copies of the whole matrix (+ whatever sorted uses internally):

A[0] contains a copy of each element of the matrix because OP appends them all there.
The rest of the rows A also still exist, so they also contain an extra copy of each element (apart from the first row).
sorted generates a third list.

By using chain you eliminate the first, so you still remain with 2 copies.

The easiest way to do this without copying the whole matrix in a sorted list, is to use a sorted queue of iterators of the different rows, sorted by the next value, and pop and reinsert on this queue until you have the median. I use bisect.insort_left for the insertion in order

from bisect import insort_left
def find_median(matrix):
 """
 finds the median in a matrix with sorted rows
 """
 median_index = len(matrix) * len(matrix[0]) // 2
 iterators = map(iter, matrix)
 iterators = deque(sorted((next(it), row, it) for (row, it) in enumerate(iterators)))
 idx = 0
 while idx <= median_index:
 value, row, iterator = iterators.popleft()
 try:
 item = next(iterator), row, iterator
 insort_left(iterators, item)
 except StopIteration:
 pass
 idx += 1
# print(value, idx)
 return value

The deque consumes some extra memory, but only O(N) instead of O(NM). This can also be done using a list of length N with the index of the iteration, doing the iteration over the different rows yourself.

The row is added to the item as tiebreaker when there are multiple rows with the same value because iterators are not sortable.

Standard library

I found out that heapq.merge does the same as what I do with the deque of iterators, so this works too:

from heapq import merge
from itertools import islice
def find_median_heapq(matrix):
 median_index = len(matrix) * len(matrix[0]) // 2
 all_items = merge(*matrix)
 return next(islice(all_items, median_index, None))

Question 13

A[0] doesn't copy, what do you mean by it? "The original A, A[0] and"

Question 14

A[0] contains a copy of each element of the matrix because OP appends them all there. The rest of the rows A also still exist, so they also contain an extra copy of each element (apart from the first row), and sorted generates a third list

Question 15

Please can you clarify that in your answer, as it currently it reads as indexing a list returns a copy.

Question 16

Shouldn't "1st" be used instead of "1nd"?

spyr03 spyr03 3,05213 silver badges27 bronze badges · Accepted Answer · 2018-05-25 14:36:33Z

Sorting the contents of the matrix and then picking the index with the median value is a good approach. Lets see if we can do it with constant extra memory.

for i in range(1,len(A)):
 A[0].extend(A[i])

This extends the first row of the matrix to contain every row in a flat list. Before the matrix was of size N * M, whereas now it is N * M (the first row + (N - 1) * M (all the other rows). Subtracting the original size from this tells us how much extra memory we are using. We use (N - 1) * M additional memory or in other words O(NM) extra memory. This is not what we want.

The reason to put all the elements in one list is to make sorting easy. Lets see if we can sort without needing a flatten (1d) list. There are many sorts that don't require extra memory, they are called "inplace" sorting algorithms. For simplicity we will modify selection sort to work for our case.

How selection sort works is it picks the smallest element in the list, and puts it at the front. Then it finds the next smallest element, and puts it second, and so forth. To implement this, we can find the smallest in the whole list, and swap it with the first element. Then we can find the smallest of the list skipping the first slot.

def index_of_smallest(numbers, starting_index):
 # Assume numbers is not empty.
 smallest, index = numbers[starting_index], starting_index
 for i, number in enumerate(numbers[starting_index:], starting_index):
 if number < smallest:
 smallest, index = number, i
 return index
def selection_sort(numbers):
 size = len(numbers)
 for i in range(size):
 index = index_of_smallest(numbers, i)
 numbers[i], numbers[index] = numbers[index], numbers[i]
 # Don't return anything, we are modifying it inplace.

Now, we need this process to work on a matrix instead of a flat list. This is straightforward enough, we can loop over the matrix (left to right, top to bottom) and ignore cells we have already dealt with. In the below code x is the row coordinate, and y is the column coordinate.

def coordinates_of_smallest(matrix, starting_x, starting_y):
 smallest, smallest_x, smallest_y = matrix[starting_x][starting_y], starting_x, starting_y
 for x, row in enumerate(matrix):
 for y, cell in enumerate(row):
 if x < starting_x or (x == starting_x and y < starting_y):
 continue
 if cell < smallest:
 smallest, smallest_x, smallest_y = cell, x, y
 return smallest_x, smallest_y
def selection_sort(matrix):
 # Assume the matrix is not empty.
 n, m = len(matrix), len(matrix[0])
 for x in range(n):
 for y in range(m):
 smallest_x, smallest_y = coordinates_of_smallest(matrix, x, y)
 matrix[x][y], matrix[smallest_x][smallest_y] = matrix[smallest_x][smallest_y], matrix[x][y]
>>> matrix = [[1, 3, 5], [2, 6, 9], [3, 6, 9]]
>>> selection_sort(matrix)
>>> print(matrix) # [[1, 2, 3], [3, 5, 6], [6, 9, 9]]

Now getting the median of this is a piece of cake, it will be in the middle slot of the middle row! Since N * M is odd, both N and M must be odd. Therefore the median is at matrix[N // 2][M // 2].

There is a little room for improvement here. While we only use constant extra memory, our time complexity has gone up from O(nm lognm) to O((nm)**2). For a better time complexity, I would recommend using inplace quicksort which brings us back to O(nm lognm).

Another point is that we are doing too much work. Once we have worked our way up to the row N // 2 and the slot M // 2, we are actually done! We have put the median element in it's place, and we can stop. This is a simple enough check to add, but can cut the actual running time of the code in half.

this makes no use of the fact that the numbers in the row are in order, and the in-place sorting is unnecessary
@MaartenFabré The inplace sorting is critical for it to use constant additional memory. That's the number one priority, not time complexity. Yes it it not necessary to do a full sort, but I don't think the additional code complexity is worth it. Also yes finding the smallest element can be done quicker by taking into account the rows are sorted, but again it is additional code complexity for something that isn't a priority.

Stack Exchange Network

Calculate median of a matrix

3 Answers 3

Making one list

Alternative approach

Standard library

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Calculate median of a matrix

3 Answers 3

Making one list

Alternative approach

Standard library

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions