I'm trying to speed up the execution of the following problem:
I have two functions which are used together.
Here are the functions explained:
The first function simulates N
coin flips and returns an array of strings with the results:
def throw_a_coin(N):
return np.random.choice(['H','T'], size=N)
The second function takes in number_of_samples, sample_size
and returns sample_probs
.
The function simulates number_of_samples
runs of sample_size
coin flips and stores the result of num_heads / sample size
in sample_probs
for each run.
Here is the code:
def make_throws(number_of_samples, sample_size):
sample_probs = np.zeros(number_of_samples)
for i in range(number_of_samples):
num_heads = sum(throw_a_coin(sample_size) == "H")
sample_probs[i] = float(num_heads) / sample_size
return sample_probs
Now I want to simulate multiple runs with different sample_sizes and 200 runs for each sample size. So sample_size = 1
I would do 200 runs of 1 coin flip, for sample_size = 10
I would do 200 runs of 10 coin flips and so on.
Here is the code that takes so long:
mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))
for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)
I am pretty sure that this process can be improved by getting rid of the for loops and using array-operations instead. But I just cannot think of how to apply throw_a_coin
or make_throws
to an array.
-
4\$\begingroup\$ To the 2 people who voted to close this question: what exactly is wrong with it? \$\endgroup\$Mast– Mast ♦2018年04月15日 11:42:54 +00:00Commented Apr 15, 2018 at 11:42
-
1\$\begingroup\$ @πάνταῥεῖ The code looks very real to me. All required methods are in place so this is fully reviewable. \$\endgroup\$Simon Forsberg– Simon Forsberg2018年04月15日 13:02:23 +00:00Commented Apr 15, 2018 at 13:02
1 Answer 1
My toy example I build from your snippets:
import numpy as np
def throw_a_coin(N):
return np.random.choice(['H','T'], size=N)
def make_throws(number_of_samples, sample_size):
sample_probs = np.zeros(number_of_samples)
for i in range(number_of_samples):
num_heads = sum(throw_a_coin(sample_size) == "H")
sample_probs[i] = float(num_heads) / sample_size
return sample_probs
sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]
mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))
for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)
cProfile (ordered by cumtime) reveals the problem (runs in 82.359 seconds):
ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 82.359 82.359 {built-in method builtins.exec}
1 0.002 0.002 82.359 82.359 fast_flip.py:1(<module>)
500 0.940 0.002 82.200 0.164 fast_flip.py:8(make_throws)
100000 79.111 0.001 79.111 0.001 {built-in method builtins.sum}
100000 0.058 0.000 2.148 0.000 fast_flip.py:4(throw_a_coin)
100000 1.455 0.000 2.090 0.000 {method 'choice' of 'mtrand.RandomState' objects}
100000 0.207 0.000 0.635 0.000 fromnumeric.py:2456(prod)
100000 0.026 0.000 0.429 0.000 _methods.py:34(_prod)
101500 0.409 0.000 0.409 0.000 {method 'reduce' of 'numpy.ufunc' objects}
6 0.000 0.000 0.176 0.029 __init__.py:1(<module>)
There is a steep gap after buildins.sum, i.e. you spend most time there. We can use np.sum
instead (pushing it down to 3.457 seconds):
ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 3.457 3.457 {built-in method builtins.exec}
1 0.002 0.002 3.457 3.457 fast_flip.py:1(<module>)
500 0.905 0.002 3.307 0.007 fast_flip.py:8(make_throws)
100000 0.046 0.000 1.869 0.000 fast_flip.py:4(throw_a_coin)
100000 1.287 0.000 1.823 0.000 {method 'choice' of 'mtrand.RandomState' objects}
201500 0.702 0.000 0.702 0.000 {method 'reduce' of 'numpy.ufunc' objects}
100000 0.172 0.000 0.536 0.000 fromnumeric.py:2456(prod)
100000 0.136 0.000 0.532 0.000 fromnumeric.py:1778(sum)
100000 0.023 0.000 0.378 0.000 _methods.py:31(_sum)
100000 0.021 0.000 0.364 0.000 _methods.py:34(_prod)
6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)
further we can replace the strings "H"
and "T"
with a Boolean and stay in numpy for longer (down to 1.633 seconds):
ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 1.633 1.633 {built-in method builtins.exec}
1 0.001 0.001 1.633 1.633 fast_flip.py:1(<module>)
500 0.101 0.000 1.485 0.003 fast_flip.py:11(make_throws)
100000 0.169 0.000 0.893 0.000 fast_flip.py:4(throw_a_coin)
100000 0.724 0.000 0.724 0.000 {method 'uniform' of 'mtrand.RandomState' objects}
100000 0.122 0.000 0.491 0.000 fromnumeric.py:1778(sum)
100000 0.024 0.000 0.354 0.000 _methods.py:31(_sum)
101500 0.334 0.000 0.334 0.000 {method 'reduce' of 'numpy.ufunc' objects}
6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)
next we can get rid of throw_a_coin
and instead sample a numer_of_samples x sample_size
array of uniformly distributed random numbers and threshold them. This also allows us to vectorize the for loop and stay in numpy even longer (0.786 seconds):
ncalls tottime percall cumtime percall filename:lineno(function)
133/1 0.003 0.000 0.786 0.786 {built-in method builtins.exec}
1 0.003 0.003 0.786 0.786 fast_flip.py:1(<module>)
500 0.053 0.000 0.634 0.001 fast_flip.py:4(make_throws)
500 0.526 0.001 0.526 0.001 {method 'uniform' of 'mtrand.RandomState' objects}
6 0.000 0.000 0.179 0.030 __init__.py:1(<module>)
Here is the code:
import numpy as np
def make_throws(number_of_samples, sample_size):
# True == Heads
throws = np.random.uniform(size=(number_of_samples, sample_size)) > 0.5
sample_probs = np.sum(throws, axis=1) / sample_size
return sample_probs
sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]
mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))
for i in range(len(sample_sizes)):
prob = make_throws(200, sample_sizes[i])
mean_of_sample_means[i] = np.mean(prob)
std_dev_of_sample_means[i] = np.std(prob)
At this point we could start to tackle the for
loop, but it would start to get very micro optimized. One might consider aggregating the results in prob and then using np.mean(prob, axis=1)
and np.std(prob, axis=1)
outside the loop, but that only nets like 20ms
so it's more of a personal preference thing.
-
\$\begingroup\$ Wow, very detailed and very helpful! I didn't know I can see how long each single operation takes! Thank you:) \$\endgroup\$David P– David P2018年04月15日 17:32:44 +00:00Commented Apr 15, 2018 at 17:32
-
\$\begingroup\$ @DavidP Yes, that's what profilers are for. They exist in every major language so this is not a Python specific thing. There is also LineProfiler which allows you to nicely view runtime per line for specific functions. \$\endgroup\$FirefoxMetzger– FirefoxMetzger2018年04月15日 17:52:49 +00:00Commented Apr 15, 2018 at 17:52
-
\$\begingroup\$ I will save this comment in my brain! I didn't know about the profilers and they will help me for sure in the future, thanks a lot! \$\endgroup\$David P– David P2018年04月16日 16:42:34 +00:00Commented Apr 16, 2018 at 16:42