I was implementing a rolling median solution and was not sure why my Python implementation was around 40 times slower than my C++ implementation.
Here are the complete implementations:
C++
#include <iostream>
#include <vector>
#include <string.h>
using namespace std;
int tree[17][65536];
void insert(int x) { for (int i=0; i<17; i++) { tree[i][x]++; x/=2; } }
void erase(int x) { for (int i=0; i<17; i++) { tree[i][x]--; x/=2; } }
int kThElement(int k) {
int a=0, b=16;
while (b--) { a*=2; if (tree[b][a]<k) k-=tree[b][a++]; }
return a;
}
long long sumOfMedians(int seed, int mul, int add, int N, int K) {
long long result = 0;
memset(tree, 0, sizeof(tree));
vector<long long> temperatures;
temperatures.push_back( seed );
for (int i=1; i<N; i++)
temperatures.push_back( ( temperatures.back()*mul+add ) % 65536 );
for (int i=0; i<N; i++) {
insert(temperatures[i]);
if (i>=K) erase(temperatures[i-K]);
if (i>=K-1) result += kThElement( (K+1)/2 );
}
return result;
}
// default input
// 47 5621 1 125000 1700
// output
// 4040137193
int main()
{
int seed,mul,add,N,K;
cin >> seed >> mul >> add >> N >> K;
cout << sumOfMedians(seed,mul,add,N,K) << endl;
return 0;
}
Python
def insert(tree,levels,n):
for i in xrange(levels):
tree[i][n] += 1
n /= 2
def delete(tree,levels,n):
for i in xrange(levels):
tree[i][n] -= 1
n /= 2
def kthElem(tree,levels,k):
a = 0
for b in reversed(xrange(levels)):
a *= 2
if tree[b][a] < k:
k -= tree[b][a]
a += 1
return a
def main():
seed,mul,add,N,K = map(int,raw_input().split())
levels = 17
tree = [[0] * 65536 for _ in xrange(levels)]
temps = [0] * N
temps[0] = seed
for i in xrange(1,N):
temps[i] = (temps[i-1]*mul + add) % 65536
result = 0
for i in xrange(N):
insert(tree,levels,temps[i])
if (i >= K):
delete(tree,levels,temps[i-K])
if (i >= K-1):
result += kthElem(tree,levels,((K+1)/2))
print result
# default input
# 47 5621 1 125000 1700
# output
# 4040137193
main()
On the above mentioned input (in the comments of the code), the C++ code took around 0.06 seconds while Python took around 2.3 seconds.
Can some one suggest the possible problems with my Python code and how to improve to less than 10x performance hit?
I don't expect it to be anywhere near C++ implementation but to the order of 5-10x. I know I can optimize this by using libraries like NumPy (and/or SciPy). I am asking this question from the point of view of using Python for solving programming challenges. These libraries are usually not allowed in these challenges. I am just asking if it is even possible to beat the time limit for this algorithm in Python.
In case somebody is interested, the C++ code is borrowed from floating median problem here.
2 Answers 2
Integer math in tight loops is an area where C++ will definitely run circles around Python. I think your expectations are too high. However, I'd like to point out a couple of things for you, even if they are not enough to reach your goal.
Avoid indexing when you can use direct iteration instead. For example, insert
can be written like this and Python won't have to deal with i
:
def insert(tree,n):
for level in tree:
level[n] += 1
n /= 2
Python does not optimize constant subexpressions out of loops, so you'll want to compute (K+1)/2
before the main loop and store it in a variable.
A 40x speed difference between C++ and Python for this sort of work is not uncommon. Implementing the optimizations by @JanneKarilla helps. Additionally you can ommit both if
statements by looping in two segments, first from 0 to K, then from K to N, which shaves off a fair amount of time.
The fastest that I can come up with uses the "maintain a sorted array" approach and is about 4.5 x faster than your original code for the input given:
from collections import deque
from bisect import bisect_left, insort
def rolling_median(iterable, K):
# taking iterator so it works on sequences and generators
it = iter(iterable)
# a deque has optimized access from its endpoints
# we consume and store the first K values
deq = deque(next(it) for _ in xrange(K))
# initialize the sorted array
sor = sorted(deq)
# index of median in sorted list
i = (K+1)//2 -1
yield sor[i]
for r in it:
# `deq` keeps track of chrological order
out = deq.popleft()
deq.append(r)
# `sor` keeps track of sortedness
sor.pop(bisect_left(sor, out))
insort(sor, r)
yield sor[i]
def RNG(seed, mul, add, N):
tk = seed
for _ in xrange(N):
yield tk
tk = (tk * mul + add) % 65536
def main(seed=47, mul=5621, add=1, N=125000, K=1700):
rand = RNG(seed, mul, add, N)
return sum(rolling_median(rand, K))
It's possible to shave off some more milisec I'm sure, but I wanted to write a nice and general rolling median function :)
array
module from the standard library instead. It provide an efficient array class for numeric values. \$\endgroup\$% 65536
all about, for example? \$\endgroup\$