Find the kth smallest element in row wise and column wise sorted element:
Example:
matrix[row][col] < matrix[row+1][col]
matrix[row][col] < matrix[row][col+1]
mat=
1 2 21
11 23 25
31 36 41
I have written following code, How to make more readable and improve its complexity:
import heapq
def kth_smallest(k, mat):
m = len(mat)
n = len(mat[0])
hp = []
for i in xrange(n):
heapq.heappush(hp,[mat[0][i], 0, i])
heapq.heapify(hp)
while k >1:
x = heapq.heappop(hp)
if x[1]+1 < m:
heapq.heappush(hp,[mat[x[1]+1][x[2]], x[1]+1, x[2]])
heapq.heapify(hp)
k-=1
return hp[0][0]
print kth_smallest(8, [[1,10,12],[2,11,21], [3,31,45]])
3 Answers 3
heappush
andheappop
maintain the heap invariant; there is no need to callheapify
after them. Alternatively, you could initialize the heap from an unsorted list usingheapify
and skip theheappush
, but as the list is already sorted in this case, it is already a heap and you don't need to apply either function.- You push the whole first row to the heap but the first
k
elements would suffice thanks to the rows being already sorted. Instead of
for i in xrange(n): heapq.heappush(hp,[mat[0][i], 0, i])
it would be more idiomatic to use
for i, value in enumerate(mat[0]): heapq.heappush(hp, [value, 0, i])
though as I already advised to take only
k
elements, you could instead usefor i, value in zip(xrange(k), mat[0]):
More idiomatic instead of
while k >1:
...k-=1
would befor _ in xrange(k - 1):
Readability of
[mat[x[1]+1][x[2]], x[1]+1, x[2]]
could be improved to
[mat[row + 1][col], row + 1, col]
by unpacking
value, row, col = heapq.heappop(hp)
Revised code:
def kth_smallest(k, mat):
'''Return the kth smallest element in row wise and column wise sorted matrix.
mat represents the matrix as list of lists.
'''
m = len(mat)
hp = [(value, 0, col) for col, value in zip(xrange(k), mat[0])]
# hp is a min heap because mat[0] is assumed to be sorted
for _ in xrange(k - 1):
value, row, col = heapq.heappop(hp)
row += 1
if row < m:
heapq.heappush(hp, (mat[row][col], row, col))
return hp[0][0]
-
\$\begingroup\$ I added your code to the timing table in my answer. Obviously your solution scales a lot better. \$\endgroup\$Graipher– Graipher2017年04月19日 11:17:46 +00:00Commented Apr 19, 2017 at 11:17
A very easy way is to hope that list additions and the standard Python sort are fast (the latter being especially fast for partially sorted arrays).
In that case you could have used the second argument of the sum
function as a starting point for the sum and used:
def kth_smallest(k, mat):
entries = sorted(sum(mat, []))
return entries[k - 1]
On my machine, for the example given, this outperforms your solution for small matrices, but breaks down for larger matrices. It is, however a lot easier to understand.
+----+---------------------------------------+----------+---------+--------------+
| k | matrix | Graipher | Harsha | Janne Karila |
+----+---------------------------------------+----------+---------+--------------+
| 1 | [[1,10,12],[2,11,21], [3,31,45]] | 1.34 μs | 1.61 μs | 1.07 μs |
| 8 | [[1,10,12],[2,11,21], [3,31,45]] | 1.22 μs | 5.94 μs | 3.72 μs |
| 8 | [[1,10,12, 25, 38, 42, 51], | 2.43 μs | 12 μs | 5.55 μs |
| | [2,11,21, 35, 48, 52, 67], | | | |
| | [3,31,45, 47, 58, 63, 72], | | | |
| | [4, 32, 46, 48, 59, 64, 73]] | | | |
| 25 | [range(i, i+50) for i in range(50)] | 286 μs | 220 μs | 42.4 μs |
| 25 | [range(i, i+100) for i in range(100)] | 1.8 ms | 455 μs | 79.3 μs |
+----+---------------------------------------+----------+---------+--------------+
Style-wise, there are only a few PEP8 violations, all missing whitespace around operators.
-
\$\begingroup\$ Don't you think
itertools.chain.from_iterable(mat)
would be more idiomatic thansum(mat, [])
? \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2017年04月18日 19:57:19 +00:00Commented Apr 18, 2017 at 19:57 -
\$\begingroup\$ @MathiasEttinger You are probably right, will try it out tomorrow. But, I always wanted to find a use for the second argument of
sum
^^ \$\endgroup\$Graipher– Graipher2017年04月18日 20:51:32 +00:00Commented Apr 18, 2017 at 20:51 -
\$\begingroup\$ @MathiasEttinger Performance-wise it is basically the same, with
sum
being slightly faster (by about 0.2 µs). \$\endgroup\$Graipher– Graipher2017年04月18日 21:08:18 +00:00Commented Apr 18, 2017 at 21:08 -
\$\begingroup\$ This will scale badly, try a bigger array \$\endgroup\$kezzos– kezzos2017年04月19日 10:59:30 +00:00Commented Apr 19, 2017 at 10:59
This problem is already solved in the heapq
module. If you read more carefully its documentation you'll find heapq.merge
whose description boils down to "it's similar to sorted(itertools.chain(*iterables))
[pretty much as in @Graipher answer] but more efficient".
The last thing left is to extract out the \$k^{th}\$ value out of the returned generator. You can do it manually by using next
k times or using itertools.islice
to do it for you:
from heapq import merge
from itertools import islice
def kth_smallest(k, mat):
sorted = merge(*mat)
sorted_from_k = islice(sorted, k, None)
try:
return next(sorted_from_k)
except StopIteration:
raise IndexError(k)
In case of very large number of rows in the original matrix, you may want to limit their unpacking. You can try slicing:
def kth_smallest(k, mat):
sorted = merge(*mat[:k])
sorted_from_k = islice(sorted, k, None)
try:
return next(sorted_from_k)
except StopIteration:
raise IndexError(k)
[[0, 1], [1, 2]]
,1
or2
? All our functions would say1
(yours included). \$\endgroup\$