Random Walk simulator seems slow, are there ways to optimize my code?

Question 1

I'm trying to write a short random walk simulator in which an agent has a 50% chance on each "round" to either move toward x or back to its initial position. If the agent is already at its initial position and has to move back, it stays there instead.

I'm getting a lot of variance in the results and running the function with a distance to x of 2000 steps is already taking a while.

I was thinking I'd run the simulator over 200 steps 100 times and take the average of that, but with 40200 steps when x is 200 steps away on average, it's taking almost 10 seconds to run 100 times, which seems extremely slow.

Are there things I could optimize in my code to make it faster? Also, if you see glaring problems with design, suboptimal memory usage, etc...please don't hesitate to comment on those as well, thank you.

PS: I use a random int function of [0,1] for my 50% probability.

import numpy as np
import random as rand
def randomWalk(x):
 position = 0
 steps = 0
 while (position != x):
 move = rand.randint(0,1)
 steps += 1
 if (move == 1):
 position += 1
 elif (move == 0) & (position != 0):
 position -= 1 
 return steps
trials = [0] * 100
for i in range(100):
 trials[i] = randomWalk(200)
sum = 0.
for trial in trials:
 sum += trial
avg = 0.
avg = sum / len(trials)
print avg

Question 2

Why do you import numpy if you're not using it?

Question 3

Performance

I threw a profiling program at your code (Spyder IDE built-in profiler). The results showed, that most of the time is wasted on rand.randint(0, 1). Instead of calling rand.randint(0, 1) and checking for 0 and 1, use rand.random(). This will give you a random number in the interval \$ [0, 1)\$. Your condition will then become \$ < 0.5 \$. On my machine this reduced the time to run the code from about \10ドル\$-\12ドルs\$ to around \1ドル\$-\1ドル.5s\,ドル which is around an order of magnitude.

My theory on this is that Python uses an algorithm that builds on rand.random() (or at least the same source of random values) and maps those random values to integers based on the given limits in rand.randint(low, high). This mapping probably introduces an small overheard which is not to big to be a real problem if you do not call the function to often. But in your program, the function is called a lot (profiler says more than 4 million times), so the overhead sums up and becomes noteable.

Code and style

Style

I don't know if there is a style requirement where you are writing this program, but in general, Python variable and function names should be all_lower_case (see the infamous PEP8).

Documentation

It is always nice to have at least a little documentation of the given functions. Especially since Python is an untyped language and maybe others (or a later version of you) finds the code anywhere and want to reuse it. Again there is a nice reference for Python (link).

Comparison operator

There is no need to check for move==0 in the elif-part since you know the value can only be 0 or 1. Please use and and not the bitwise-and & when you concatenate booleans. See this SO question for a elaborate explenation of the differences

List comprehension

There is no real need to preallocate space with trials = [0] * 100. You should use a list comprehension instead

trials = [randomWalk(200) for _ in xrange(100)]

Result

I had a few moments of time, so I played a little bit with your code. I implemented the suggestions from my answer, added some documentation, introduced another function, added some variables and used numpy to have a litte bit of extra fun. The code I came up with is following.

import numpy as np
import random
def random_walk(x):
 """Simulate a single random walk"""
 position = 0
 steps = 0
 while (position != x):
 move = random.random()
 steps += 1
 if move < 0.5:
 position += 1
 elif position != 0:
 position -= 1 
 return steps
def random_walk_simulator(n_walks=100, walk_distance=200):
 """Simulate a bunch of random walks"""
 trials = np.array([random_walk(walk_distance) for _ in range(n_walks)])
 print("Mean # of steps: {:.2f}".format(trials.mean()))
 print("STD # of steps: {:.2f}".format(trials.std()))
if __name__ == '__main__':
 random_walk_simulator(100, 200)

A note on numpy: I can not say for sure if there is a real performance difference using numpy instead of list for calculating the mean values etc.. I know from my experience, it has some seriously optimised algorithms that allow fast and numerically safe(r) computations on large arrays. Additionally, numpy allows convenient handling of list/arrays/matrices with more than one dimension. So in your case (without in-depth analysis) it basically comes down to personal preference and maybe convenience.

Question 4

Thank you for all the advice, this is very helpful. Could I ask you what the profiling program you used is? I didn't know such a thing existed. Do you have a theory as to why changing from randint to random saves that much time? Lastly, what is the difference between using 'and' and '&'?

Question 5

Sorry just one more thing: Is using a numpy array faster than a list when you run computations on its elements (such as sum, mean, std, etc...)?

Question 6

@jeremyradcliff: Please see my edits.

Question 7

Thanks for the links and clarifications. I really learned a lot from your post, that was great!

Question 8

trials = [0] * 100
for i in range(100):
 trials[i] = randomWalk(200)

You don't need to create a list of zeroes in order to create a list of random walks. Just use a list comprehension:

trials = [randomWalk(200) for _ in xrange(100)] # In Python3, you should use range().

sum = 0.
for trial in trials:
 sum += trial
avg = 0.
avg = sum / len(trials)

You shouldn't use sum as a variable name because it conflicts with the built-in function. You know what? You can use that built-in function:

avg = sum(trials, 0.) / len(trials)

print avg

I'm assuming that you are using Python2 just because of that line (and because of the decimal points in your numbers). If you put parentheses around avg, your program would run in both versions. Since I used xrange() above which is not around in Python3, you can't now; but you could also use range() up there to make it bothways-compatible. range() takes up more memory than xrange(), in Python2, but with only a hundred numbers, it shouldn't really make much difference anyway.

AlexV AlexV 7,3532 gold badges24 silver badges47 bronze badges · Accepted Answer · 2016-03-21 17:54:39Z

Performance

I threw a profiling program at your code (Spyder IDE built-in profiler). The results showed, that most of the time is wasted on rand.randint(0, 1). Instead of calling rand.randint(0, 1) and checking for 0 and 1, use rand.random(). This will give you a random number in the interval \$ [0, 1)\$. Your condition will then become \$ < 0.5 \$. On my machine this reduced the time to run the code from about \10ドル\$-\12ドルs\$ to around \1ドル\$-\1ドル.5s\,ドル which is around an order of magnitude.

My theory on this is that Python uses an algorithm that builds on rand.random() (or at least the same source of random values) and maps those random values to integers based on the given limits in rand.randint(low, high). This mapping probably introduces an small overheard which is not to big to be a real problem if you do not call the function to often. But in your program, the function is called a lot (profiler says more than 4 million times), so the overhead sums up and becomes noteable.

Code and style

Style

I don't know if there is a style requirement where you are writing this program, but in general, Python variable and function names should be all_lower_case (see the infamous PEP8).

Documentation

It is always nice to have at least a little documentation of the given functions. Especially since Python is an untyped language and maybe others (or a later version of you) finds the code anywhere and want to reuse it. Again there is a nice reference for Python (link).

Comparison operator

There is no need to check for move==0 in the elif-part since you know the value can only be 0 or 1. Please use and and not the bitwise-and & when you concatenate booleans. See this SO question for a elaborate explenation of the differences

List comprehension

There is no real need to preallocate space with trials = [0] * 100. You should use a list comprehension instead

trials = [randomWalk(200) for _ in xrange(100)]

Result

I had a few moments of time, so I played a little bit with your code. I implemented the suggestions from my answer, added some documentation, introduced another function, added some variables and used numpy to have a litte bit of extra fun. The code I came up with is following.

import numpy as np
import random
def random_walk(x):
 """Simulate a single random walk"""
 position = 0
 steps = 0
 while (position != x):
 move = random.random()
 steps += 1
 if move < 0.5:
 position += 1
 elif position != 0:
 position -= 1 
 return steps
def random_walk_simulator(n_walks=100, walk_distance=200):
 """Simulate a bunch of random walks"""
 trials = np.array([random_walk(walk_distance) for _ in range(n_walks)])
 print("Mean # of steps: {:.2f}".format(trials.mean()))
 print("STD # of steps: {:.2f}".format(trials.std()))
if __name__ == '__main__':
 random_walk_simulator(100, 200)

A note on numpy: I can not say for sure if there is a real performance difference using numpy instead of list for calculating the mean values etc.. I know from my experience, it has some seriously optimised algorithms that allow fast and numerically safe(r) computations on large arrays. Additionally, numpy allows convenient handling of list/arrays/matrices with more than one dimension. So in your case (without in-depth analysis) it basically comes down to personal preference and maybe convenience.

Thank you for all the advice, this is very helpful. Could I ask you what the profiling program you used is? I didn't know such a thing existed. Do you have a theory as to why changing from randint to random saves that much time? Lastly, what is the difference between using 'and' and '&'?
Sorry just one more thing: Is using a numpy array faster than a list when you run computations on its elements (such as sum, mean, std, etc...)?
Thanks for the links and clarifications. I really learned a lot from your post, that was great!

Stack Exchange Network

Random Walk simulator seems slow, are there ways to optimize my code?

2 Answers 2

Performance

Code and style

Style

Documentation

Comparison operator

List comprehension

Result

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Random Walk simulator seems slow, are there ways to optimize my code?

2 Answers 2

Performance

Code and style

Style

Documentation

Comparison operator

List comprehension

Result

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions