I am consistently timing out on the mentioned kata:
Task You will be given an array of random integers and a number n. You have to extract n smallest integers out of it preserving the original order.
Examples
performant_smallest([1, 2, 3, 4, 5], 3) == [1, 2, 3] performant_smallest([5, 4, 3, 2, 1], 3) == [3, 2, 1] performant_smallest([1, 2, 3, 4, 1], 3) == [1, 2, 1] performant_smallest([1, 2, 3, -4, 0], 3) == [1, -4, 0] performant_smallest([2, 1, 3, 2, 3], 3) == [2, 1, 2]
Notes
- The number
n
will always be smaller then the array length- There will be duplicates in the array, and they have to be returned in the order of their each separate appearence
Your solution has to be at least as fast as reference, passing the tests in12000+ ms
- If you didn't pass a small amount of tests, try submitting again. Otherwise, your solution is not efficient enough.
Test suite
Tests: 1500 Array size: 4500 Numbers range: [-100..100] Number of elements to return: 50-100% of the array
Note: you can't use Python 2
I wrote several different functions for this:
from heapq import *
from collections import Counter as count
from copy import copy
def heap_pop(lst, n):
heap, res, i = copy(lst), [0]*n, 0
heapify(heap)
c = count(heappop(heap) for _ in range(n))
for itm in lst:
if c.get(itm, 0) > 0:
res[i] = itm
i += 1
c[itm] -= 1
if i == n: break
return res
def sorting(lst, n):
res, i = [0]*n, 0
c = count(sorted(lst)[:n])
for itm in lst:
if c.get(itm, 0) > 0:
res[i] = itm
i += 1
c[itm] -= 1
if i == n: break
return res
def n_smallest(lst, n):
heap, res, i = copy(lst), [0]*n, 0
heapify(heap)
c = count(nsmallest(n, heap))
for itm in lst:
if c.get(itm, 0) > 0:
res[i] = itm
i += 1
c[itm] -= 1
if i == n: break
return res
def counter_sort(lst, n):
c = count(lst)
d, x, res, sm = sorted(c), {}, [0]*n, 0
for k in d:
z = min(c[k], n-sm)
x[k] = z
sm += z
if sm == n: break
sm = 0
for itm in lst:
if x.get(itm, 0) > 0:
res[sm] = itm
sm += 1
x[itm] -= 1
if sm == n: break
return res
def cs2(lst, n):
c = count(lst)
d, x, res, sm = sorted(c), {}, [0]*n, 0
for k in d:
z = min(c[k], n-sm)
x[k] = z
sm += z
if sm == n: break
sm = 0
for itm in (itm for itm in lst if itm in x):
if x.get(itm, 0) > 0:
res[sm] = itm
sm += 1
x[itm] -= 1
if sm == n: break
return res
def csy(lst, n):
c, sm = [0]*201, 0
for itm in lst: c[itm+100] += 1 #Map each number to num+100.
x, res, sm = {}, [0]*n, 0
for k in range(201):
z = min(c[k], n-sm)
x[k-100] = z
sm += z
if sm == n: break
sm = 0
for itm in lst:
if x.get(itm, 0) > 0:
res[sm] = itm
sm += 1
x[itm] -= 1
if sm == n: break
return res
def csz(lst, n):
c = [0]*201 #Our counter.
for itm in lst: c[itm+100] += 1 #Map each number to num+100.
res, sm = [0]*n, 0
for k in range(201): #Iterate through the numbers in our counter.
sm += c[k]
if sm >= n:
c[k] += n - sm #The sum of `c[:k+1]` should be equal to `n`, and this would give us the count of the `n` smallest elements.
break
sm = 0
for itm in lst: #Iterate through the list to present the elements in their appearance order in the list.
v = itm+100 #The mapping between the list item and its index in our counter.
if v <= k and c[v] > 0: #The item is one of the `n` smallest items.
res[sm] = itm #Place it in its position in the result list.
sm += 1
c[v] -= 1
if sm == n: break
return res
I benchmarked all of them (lst
is 4500 random numbers in [-100, 100], and n
in [2250, 4500] with 10000 iterations) to simulate the actual data set. I'm very surprised that none of my counting sort implementations worked considering that it is linear time and there were only two \2ドル ,円 O(n)\$ iterations in it.
csz()
was the clear best performer, but it only gets to 1068 of the test cases before failing (the same applies to all my counting sort implementations; despite showing massive difference under the benchmark (17.6 - 29.2s) they all reach and terminate at 1068 test cases).
-
\$\begingroup\$ Added new functions I tried. I don't think I can do any better. \$\endgroup\$Tobi Alafin– Tobi Alafin2019年02月16日 19:29:38 +00:00Commented Feb 16, 2019 at 19:29
1 Answer 1
The Kata was broken—perhaps Codewars server load currently is a lot more than it was 9 months ago when the question was made—but for whatever reason, none of the top 6 solutions could pass within the time constraints imposed on the question. The kata was reworked after someone filed an issue, and now my solution passes. I did write an even more micro optimised version of counting sort which is what I eventually submitted. The only change I made was that since n
was 50-100% of the list length it would be faster to find the n smallest elements by counting off the (length - n
) largest elements than counting up to n
, and since the array had a fixed length for the random tests, we could do that. Here's my final version:
My Code
count = 0
def performant_smallest(lst, n):
global count
count += 1
c = [0]*101 #Our counter.
for itm in lst: c[itm+50] += 1 #Map each number to num+50
res, sm, stop = [0]*n, 0, (5 - n if count <= 5 else 10000 - n)
for k in range(100, -1, -1): #Iterate through the numbers in our counter starting backwards to get even better performance, since `n` is large.
sm += c[k]
if sm >= stop:
c[k] = sm - stop #The sum of `c[:k+1]` should be equal to `n`, and this would give us the count of the `n` smallest elements.
break
sm = 0
for itm in lst: #Iterate through the list to present the elements in their appearance order in the list.
v = itm+50 #The mapping between the list item and its index in our counter.
if v <= k and c[v] > 0: #The item is one of the `n` smallest items.
res[sm] = itm #Place it in its position in the result list.
sm += 1
c[v] -= 1
if sm == n: break
return res
Explore related questions
See similar questions with these tags.