For-Loop with numpy array too slow

Question 1

I'm trying to speed up the execution of the following problem:

I have two functions which are used together.

Here are the functions explained:

The first function simulates N coin flips and returns an array of strings with the results:

def throw_a_coin(N):
 return np.random.choice(['H','T'], size=N)

The second function takes in number_of_samples, sample_size and returns sample_probs. The function simulates number_of_samples runs of sample_size coin flips and stores the result of num_heads / sample size in sample_probs for each run.

Here is the code:

def make_throws(number_of_samples, sample_size):
 sample_probs = np.zeros(number_of_samples)
 for i in range(number_of_samples):
 num_heads = sum(throw_a_coin(sample_size) == "H")
 sample_probs[i] = float(num_heads) / sample_size
 return sample_probs

Now I want to simulate multiple runs with different sample_sizes and 200 runs for each sample size. So sample_size = 1 I would do 200 runs of 1 coin flip, for sample_size = 10 I would do 200 runs of 10 coin flips and so on.

Here is the code that takes so long:

mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))
for i in range(len(sample_sizes)):
 prob = make_throws(200, sample_sizes[i])
 mean_of_sample_means[i] = np.mean(prob)
 std_dev_of_sample_means[i] = np.std(prob)

I am pretty sure that this process can be improved by getting rid of the for loops and using array-operations instead. But I just cannot think of how to apply throw_a_coin or make_throws to an array.

Question 2

To the 2 people who voted to close this question: what exactly is wrong with it?

Question 3

@πάνταῥεῖ The code looks very real to me. All required methods are in place so this is fully reviewable.

Question 4

My toy example I build from your snippets:

import numpy as np
def throw_a_coin(N):
 return np.random.choice(['H','T'], size=N)
def make_throws(number_of_samples, sample_size):
 sample_probs = np.zeros(number_of_samples)
 for i in range(number_of_samples):
 num_heads = sum(throw_a_coin(sample_size) == "H")
 sample_probs[i] = float(num_heads) / sample_size
 return sample_probs
sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]
mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))
for i in range(len(sample_sizes)):
 prob = make_throws(200, sample_sizes[i])
 mean_of_sample_means[i] = np.mean(prob)
 std_dev_of_sample_means[i] = np.std(prob)

cProfile (ordered by cumtime) reveals the problem (runs in 82.359 seconds):

ncalls tottime percall cumtime percall filename:lineno(function)
 133/1 0.003 0.000 82.359 82.359 {built-in method builtins.exec}
 1 0.002 0.002 82.359 82.359 fast_flip.py:1(<module>)
 500 0.940 0.002 82.200 0.164 fast_flip.py:8(make_throws)
100000 79.111 0.001 79.111 0.001 {built-in method builtins.sum}
100000 0.058 0.000 2.148 0.000 fast_flip.py:4(throw_a_coin)
100000 1.455 0.000 2.090 0.000 {method 'choice' of 'mtrand.RandomState' objects}
100000 0.207 0.000 0.635 0.000 fromnumeric.py:2456(prod)
100000 0.026 0.000 0.429 0.000 _methods.py:34(_prod)
101500 0.409 0.000 0.409 0.000 {method 'reduce' of 'numpy.ufunc' objects}
 6 0.000 0.000 0.176 0.029 __init__.py:1(<module>)

There is a steep gap after buildins.sum, i.e. you spend most time there. We can use np.sum instead (pushing it down to 3.457 seconds):

ncalls tottime percall cumtime percall filename:lineno(function)
 133/1 0.003 0.000 3.457 3.457 {built-in method builtins.exec}
 1 0.002 0.002 3.457 3.457 fast_flip.py:1(<module>)
 500 0.905 0.002 3.307 0.007 fast_flip.py:8(make_throws)
100000 0.046 0.000 1.869 0.000 fast_flip.py:4(throw_a_coin)
100000 1.287 0.000 1.823 0.000 {method 'choice' of 'mtrand.RandomState' objects}
201500 0.702 0.000 0.702 0.000 {method 'reduce' of 'numpy.ufunc' objects}
100000 0.172 0.000 0.536 0.000 fromnumeric.py:2456(prod)
100000 0.136 0.000 0.532 0.000 fromnumeric.py:1778(sum)
100000 0.023 0.000 0.378 0.000 _methods.py:31(_sum)
100000 0.021 0.000 0.364 0.000 _methods.py:34(_prod)
 6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)

further we can replace the strings "H" and "T" with a Boolean and stay in numpy for longer (down to 1.633 seconds):

ncalls tottime percall cumtime percall filename:lineno(function)
 133/1 0.003 0.000 1.633 1.633 {built-in method builtins.exec}
 1 0.001 0.001 1.633 1.633 fast_flip.py:1(<module>)
 500 0.101 0.000 1.485 0.003 fast_flip.py:11(make_throws)
100000 0.169 0.000 0.893 0.000 fast_flip.py:4(throw_a_coin)
100000 0.724 0.000 0.724 0.000 {method 'uniform' of 'mtrand.RandomState' objects}
100000 0.122 0.000 0.491 0.000 fromnumeric.py:1778(sum)
100000 0.024 0.000 0.354 0.000 _methods.py:31(_sum)
101500 0.334 0.000 0.334 0.000 {method 'reduce' of 'numpy.ufunc' objects}
 6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)

next we can get rid of throw_a_coin and instead sample a numer_of_samples x sample_size array of uniformly distributed random numbers and threshold them. This also allows us to vectorize the for loop and stay in numpy even longer (0.786 seconds):

ncalls tottime percall cumtime percall filename:lineno(function)
 133/1 0.003 0.000 0.786 0.786 {built-in method builtins.exec}
 1 0.003 0.003 0.786 0.786 fast_flip.py:1(<module>)
 500 0.053 0.000 0.634 0.001 fast_flip.py:4(make_throws)
 500 0.526 0.001 0.526 0.001 {method 'uniform' of 'mtrand.RandomState' objects}
 6 0.000 0.000 0.179 0.030 __init__.py:1(<module>)

Here is the code:

import numpy as np
def make_throws(number_of_samples, sample_size):
 # True == Heads
 throws = np.random.uniform(size=(number_of_samples, sample_size)) > 0.5
 sample_probs = np.sum(throws, axis=1) / sample_size
 return sample_probs
sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]
mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))
for i in range(len(sample_sizes)):
 prob = make_throws(200, sample_sizes[i])
 mean_of_sample_means[i] = np.mean(prob)
 std_dev_of_sample_means[i] = np.std(prob)

At this point we could start to tackle the for loop, but it would start to get very micro optimized. One might consider aggregating the results in prob and then using np.mean(prob, axis=1) and np.std(prob, axis=1) outside the loop, but that only nets like 20ms so it's more of a personal preference thing.

Question 5

Wow, very detailed and very helpful! I didn't know I can see how long each single operation takes! Thank you:)

Question 6

@DavidP Yes, that's what profilers are for. They exist in every major language so this is not a Python specific thing. There is also LineProfiler which allows you to nicely view runtime per line for specific functions.

Question 7

I will save this comment in my brain! I didn't know about the profilers and they will help me for sure in the future, thanks a lot!

FirefoxMetzger 1,1016 silver badges11 bronze badges · Accepted Answer · 2018-04-15 14:19:37Z

My toy example I build from your snippets:

import numpy as np
def throw_a_coin(N):
 return np.random.choice(['H','T'], size=N)
def make_throws(number_of_samples, sample_size):
 sample_probs = np.zeros(number_of_samples)
 for i in range(number_of_samples):
 num_heads = sum(throw_a_coin(sample_size) == "H")
 sample_probs[i] = float(num_heads) / sample_size
 return sample_probs
sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]
mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))
for i in range(len(sample_sizes)):
 prob = make_throws(200, sample_sizes[i])
 mean_of_sample_means[i] = np.mean(prob)
 std_dev_of_sample_means[i] = np.std(prob)

cProfile (ordered by cumtime) reveals the problem (runs in 82.359 seconds):

ncalls tottime percall cumtime percall filename:lineno(function)
 133/1 0.003 0.000 82.359 82.359 {built-in method builtins.exec}
 1 0.002 0.002 82.359 82.359 fast_flip.py:1(<module>)
 500 0.940 0.002 82.200 0.164 fast_flip.py:8(make_throws)
100000 79.111 0.001 79.111 0.001 {built-in method builtins.sum}
100000 0.058 0.000 2.148 0.000 fast_flip.py:4(throw_a_coin)
100000 1.455 0.000 2.090 0.000 {method 'choice' of 'mtrand.RandomState' objects}
100000 0.207 0.000 0.635 0.000 fromnumeric.py:2456(prod)
100000 0.026 0.000 0.429 0.000 _methods.py:34(_prod)
101500 0.409 0.000 0.409 0.000 {method 'reduce' of 'numpy.ufunc' objects}
 6 0.000 0.000 0.176 0.029 __init__.py:1(<module>)

There is a steep gap after buildins.sum, i.e. you spend most time there. We can use np.sum instead (pushing it down to 3.457 seconds):

ncalls tottime percall cumtime percall filename:lineno(function)
 133/1 0.003 0.000 3.457 3.457 {built-in method builtins.exec}
 1 0.002 0.002 3.457 3.457 fast_flip.py:1(<module>)
 500 0.905 0.002 3.307 0.007 fast_flip.py:8(make_throws)
100000 0.046 0.000 1.869 0.000 fast_flip.py:4(throw_a_coin)
100000 1.287 0.000 1.823 0.000 {method 'choice' of 'mtrand.RandomState' objects}
201500 0.702 0.000 0.702 0.000 {method 'reduce' of 'numpy.ufunc' objects}
100000 0.172 0.000 0.536 0.000 fromnumeric.py:2456(prod)
100000 0.136 0.000 0.532 0.000 fromnumeric.py:1778(sum)
100000 0.023 0.000 0.378 0.000 _methods.py:31(_sum)
100000 0.021 0.000 0.364 0.000 _methods.py:34(_prod)
 6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)

further we can replace the strings "H" and "T" with a Boolean and stay in numpy for longer (down to 1.633 seconds):

ncalls tottime percall cumtime percall filename:lineno(function)
 133/1 0.003 0.000 1.633 1.633 {built-in method builtins.exec}
 1 0.001 0.001 1.633 1.633 fast_flip.py:1(<module>)
 500 0.101 0.000 1.485 0.003 fast_flip.py:11(make_throws)
100000 0.169 0.000 0.893 0.000 fast_flip.py:4(throw_a_coin)
100000 0.724 0.000 0.724 0.000 {method 'uniform' of 'mtrand.RandomState' objects}
100000 0.122 0.000 0.491 0.000 fromnumeric.py:1778(sum)
100000 0.024 0.000 0.354 0.000 _methods.py:31(_sum)
101500 0.334 0.000 0.334 0.000 {method 'reduce' of 'numpy.ufunc' objects}
 6 0.000 0.000 0.178 0.030 __init__.py:1(<module>)

next we can get rid of throw_a_coin and instead sample a numer_of_samples x sample_size array of uniformly distributed random numbers and threshold them. This also allows us to vectorize the for loop and stay in numpy even longer (0.786 seconds):

ncalls tottime percall cumtime percall filename:lineno(function)
 133/1 0.003 0.000 0.786 0.786 {built-in method builtins.exec}
 1 0.003 0.003 0.786 0.786 fast_flip.py:1(<module>)
 500 0.053 0.000 0.634 0.001 fast_flip.py:4(make_throws)
 500 0.526 0.001 0.526 0.001 {method 'uniform' of 'mtrand.RandomState' objects}
 6 0.000 0.000 0.179 0.030 __init__.py:1(<module>)

Here is the code:

import numpy as np
def make_throws(number_of_samples, sample_size):
 # True == Heads
 throws = np.random.uniform(size=(number_of_samples, sample_size)) > 0.5
 sample_probs = np.sum(throws, axis=1) / sample_size
 return sample_probs
sample_sizes = [np.random.randint(1, 1e3) for idx in range(500)]
mean_of_sample_means = np.zeros(len(sample_sizes))
std_dev_of_sample_means = np.zeros(len(sample_sizes))
for i in range(len(sample_sizes)):
 prob = make_throws(200, sample_sizes[i])
 mean_of_sample_means[i] = np.mean(prob)
 std_dev_of_sample_means[i] = np.std(prob)

At this point we could start to tackle the for loop, but it would start to get very micro optimized. One might consider aggregating the results in prob and then using np.mean(prob, axis=1) and np.std(prob, axis=1) outside the loop, but that only nets like 20ms so it's more of a personal preference thing.

Wow, very detailed and very helpful! I didn't know I can see how long each single operation takes! Thank you:)
@DavidP Yes, that's what profilers are for. They exist in every major language so this is not a Python specific thing. There is also LineProfiler which allows you to nicely view runtime per line for specific functions.
I will save this comment in my brain! I didn't know about the profilers and they will help me for sure in the future, thanks a lot!

Stack Exchange Network

For-Loop with numpy array too slow

1 Answer 1

You must log in to answer this question.

Hot Network Questions

For-Loop with numpy array too slow

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions