In order to learn the basics of Monte Carlo I calculated pi with it. I also wrote an explanation of the reasoning behind the code.
Down here you can see the circle with random points that I simulated in my code.
enter image description here
"""
This programme calculates pi with Monte Carlo
Given a square and a circle inside it.
We have
Area_of_the_square = LENGTH ** 2
Area_of_the_circle = radius ** 2 * pi => (LENGTH ** 2) / 4 * pi
The circle is obviously smaller than the square.
We have the equation:
Area_of_the_square * number_less_than_one == Area_of_the_circle
This programme is going to put a big number of points inside the square
(I suggest TIMES_TO_REPEAT = 10**5).
It will then count how many of them are inside the circle,
number_less_than_one = points_inside_the_circle / total_points
After doing some simple math with this formula:
Area_of_the_square * number_less_than_one == Area_of_the_circle
we get that
pi = number_less_than_one * 4
NOTE: This method is deadly slow and quite complicated,
I made this programme just in order to learn.
"""
import random
TIMES_TO_REPEAT = 10**5
LENGTH = 10**5
CENTER = [LENGTH/2,LENGTH/2]
def in_circle(point):
x = point[0]
y = point[1]
center_x = CENTER[0]
center_y = CENTER[1]
radius = LENGTH/2
return (x - center_x)**2 + (y - center_y)**2 < radius**2
count = inside_count = 0
for i in range(TIMES_TO_REPEAT):
point = random.randint(1,LENGTH),random.randint(1,LENGTH)
if in_circle(point):
inside_count += 1
count += 1
pi = (inside_count / count) * 4
print(pi)
6 Answers 6
Your code seems to be working. Here are a few things to make it better.
Tuple unpacking
x = point[0]
y = point[1]
can be written :
x, y = point
Remove what is not required
You maintain a count
variable but you already know the count : it is TIMES_TO_REPEAT
.
Variable name
TIMES_TO_REPEAT
does not seem to be such a good name. NB_POINTS
would probably be better.
Removing the counting logic
Using iterators and sum
, you can remove the logic to count points :
inside_count = sum(1 for i in range(NB_POINTS) if in_circle((random.randint(1,LENGTH),random.randint(1,LENGTH))))
Underscore for throwaway values
In your iteration, you don't really care about the value of i
. It is quite a common thing to call it _
(more about this).
Move things to a function
It makes things easier to test
def compute_pi(nb_it):
inside_count = sum(1 for _ in range(nb_it) if in_circle((random.randint(1,LENGTH),random.randint(1,LENGTH))))
return (inside_count / nb_it) * 4
Add an if main
guard
It is quite common to define functions/values/classes/whatever from your python file but to use them only behind an if main
guard to make things easier to reuse via module imports for instance.
Oops
Currently, changing the value of LENGTH seems to be changing the result. This is probably wrong.
At this point, my code looks like :
import random
NB_POINTS = 10**5
LENGTH = 10**5
CENTER = [LENGTH/2,LENGTH/2]
def in_circle(point):
x,y = point
center_x, center_y = CENTER
radius = LENGTH/2
return (x - center_x)**2 + (y - center_y)**2 < radius**2
def compute_pi(nb_it):
inside_count = sum(1 for _ in range(nb_it) if in_circle((random.randint(1,LENGTH),random.randint(1,LENGTH))))
return (inside_count / nb_it) * 4
if __name__ == "__main__":
print(compute_pi(NB_POINTS))
Using the right type and the right values
In order to solve the problem above and make things easier, you could change a bit the referential you are using.
Instead of using a circle of radius R (equal to length/2) and with center (R, R) and picking integer values in square [1, R] * [1, R] maybe you could consider a circle of radius R and with center 0 and pick floating values in [-R, +R]. Even better, you could focus on the corner of the circle by considering points in [0, R]*[0, R]. Finally, R is not really important here, you could pick 1.
Then the code becomes :
import random
NB_POINTS = 10**5
def in_circle(x, y):
return x**2 + y**2 < 1
def compute_pi(nb_it):
inside_count = sum(1 for _ in range(nb_it) if in_circle(random.random(),random.random()))
return (inside_count / nb_it) * 4
if __name__ == "__main__":
print(compute_pi(NB_POINTS))
And then, this can be simplified.
import random
NB_POINTS = 10**5
def compute_pi(nb_it):
return 4 * sum(1 for _ in range(nb_it) if random.random()**2 + random.random()**2 < 1) / nb_it
if __name__ == "__main__":
print(compute_pi(NB_POINTS))
-
\$\begingroup\$ In terms of randomness, is
random.random()
evenly distributed? If not, seeing asrandom.random()**2
is always less than or equal to 1, would it not be considered the same asrandom.random()
? \$\endgroup\$flakes– flakes2014年11月10日 18:27:29 +00:00Commented Nov 10, 2014 at 18:27 -
8\$\begingroup\$ @Calpratt
random()
is uniformly distributed between 0 and 1, butrandom()**2
is not. \$\endgroup\$David Z– David Z2014年11月10日 20:58:58 +00:00Commented Nov 10, 2014 at 20:58 -
1\$\begingroup\$ Also: how is changing
LENGTH
changing the result? This is to be expected, to some extent, due to statistical fluctuations. \$\endgroup\$David Z– David Z2014年11月10日 21:03:21 +00:00Commented Nov 10, 2014 at 21:03 -
\$\begingroup\$ Would it be somehow "bad" to, instead of moving everything to one line, instead write
def in_circle: random.random()**2 + random.random()**2 < 1
? I'm new to Python myself, but I've seen a lot of emphasis on doing "one thing at a time," whereas it seems like that one line incompute_pi
is doing a lot of work, but it makes sense (at least to me) to generate the random numbers and check for their location in the same step. And also the line is more than 79 characters (as per PEP8). \$\endgroup\$shadowtalker– shadowtalker2014年11月12日 04:56:28 +00:00Commented Nov 12, 2014 at 4:56 -
\$\begingroup\$ @ssdecontrol : you are right. It would probably be better but to make this function more useful/meaningful, it should take a radius (or diameter) and the center as arguments and this would bring back the tedious details i tried to remove. As for PEP 8, i am guilty of breaking the rule of 80 characters way too often :-/. \$\endgroup\$SylvainD– SylvainD2014年11月12日 07:59:39 +00:00Commented Nov 12, 2014 at 7:59
First, a comment on the implementation: note that picking a random number between 1
and 10**5
is really no different than picking a random number between 0
and 1
; you scale everything by the same amount, so why bother? In fact, requiring the x
and y
to be integers (and not including 0
) reduces the accuracy of your method.
Removing that factor allows the code to be simplified as follows:
import random
def approximate_pi(repeats=10**5):
"""Monte Carlo simulation to approximate pi.
... explanation goes here
"""
inside = 0
for _ in range(repeats):
x, y = random.random(), random.random()
if (((0.5 - x) ** 2) + ((0.5 - y) ** 2)) <= 0.25:
inside += 1
return (4 * inside) / repeats
if __name__ == '__main__':
print approximate_pi()
Note that:
- I have moved the explanation into the function docstring, rather than have it sit outside;
- I have used the
if __name__ == '__main__'
guard to run the code, rather than have it sit directly in the module; and - I have made the number of
repeats
a parameter rather than a module-level constant.
If you did want to factor out inside_circle
, I would make LENGTH
a parameter of that function:
def inside_circle(point, side_length):
"""Return whether or not point is in the circle.
Note:
The circle has a radius of side_length / 2 and sits inside a
square with sides of length side_length. Assumes that the
square's bottom-left corner is at (0, 0), i.e. that the square
is entirely within Quadrant I.
Args:
point (tuple of int/float): The x, y coordinates of the point.
side_length (int/float): The length of the sides of the square
within which the circle sits.
"""
...
def approximate_pi(side_length, repeats=10**5):
"""Monte Carlo simulation to approximate pi."""
...
if inside_circle((x, y), side_length):
...
This allows you to from whatever import inside_circle
and use it elsewhere without having to worry about whether CENTER
is defined.
As you've already realized, computational loops in pure python are often unacceptably slow. For this sort of code you're very often better off using numpy
. For example:
import numpy as np
def pi_mc(N, seed=1234):
rndm = np.random.RandomState(seed)
x = rndm.uniform(size=N)
y = rndm.uniform(size=N)
r = x*x + y*y
return 4. * np.count_nonzero(r < 1) / N
if __name__ == "__main__":
for N in [100, 1000, 10000, 100000, 1e6]:
pp = pi_mc(N)
print N, pp, pp - np.pi
EDIT: As requested, some timings:
In [9]: for N in [100, 1000, 10000, 100000, int(1e6)]:
%timeit pi_mc(N)
...:
100000 loops, best of 3: 14.5 μs per loop
10000 loops, best of 3: 45.9 μs per loop
1000 loops, best of 3: 351 μs per loop
100 loops, best of 3: 3.36 ms per loop
10 loops, best of 3: 46.4 ms per loop
In [10]: for N in [100, 1000, 10000, 100000, int(1e6)]:
%timeit pure_python_pi_mc(N)
....:
10000 loops, best of 3: 61.5 μs per loop
1000 loops, best of 3: 548 μs per loop
100 loops, best of 3: 5.47 ms per loop
10 loops, best of 3: 54.9 ms per loop
1 loops, best of 3: 546 ms per loop
Here pure_python_pi_mc
is the code from the answer by @200_success, wrapped into a function.
As you can see, relative speed of numpy improves with the number of iterations. This is due to the fact that setting up a loop takes constant time overhead.
Two more notes about numerical side of things:
always seed your random number generator (reproducibility)
better not relying on python2 vs python3 specific division. Use
from __future__ import division
even on python 3.
-
1\$\begingroup\$ Welcome to Code Review! Would you mind including a comparison of timing results in your answer? \$\endgroup\$200_success– 200_success2014年11月10日 11:50:52 +00:00Commented Nov 10, 2014 at 11:50
-
\$\begingroup\$ @200_success done \$\endgroup\$ev-br– ev-br2014年11月10日 13:25:48 +00:00Commented Nov 10, 2014 at 13:25
-
\$\begingroup\$
always seed your random number generator (reproducibility)
: but IMO you should not set the default seed to be a definite value! The default at least should be from time... \$\endgroup\$Jorge Leitao– Jorge Leitao2014年11月10日 13:53:49 +00:00Commented Nov 10, 2014 at 13:53 -
\$\begingroup\$ first, you're losing reproducibility. Which is bad for testing/debugging :-). In production, seeding from time does not play well with multiprocessing [running multiple MC chains in parallel] \$\endgroup\$ev-br– ev-br2014年11月10日 14:54:57 +00:00Commented Nov 10, 2014 at 14:54
-
\$\begingroup\$ Can you comment on
r = x*x + y*y
instead ofr = x**2 + y**2
? Is this a style thing or is one more optimized than the other? \$\endgroup\$shadowtalker– shadowtalker2014年11月12日 05:00:28 +00:00Commented Nov 12, 2014 at 5:00
The calculation would be simplified if you centered the circle at the origin. Also, by symmetry, you only need to model one quadrant.
import random
TIMES_TO_REPEAT = 10**5
LENGTH = 10**5
def in_circle(x, y):
return x**2 + y**2 < LENGTH**2
inside_count = 0
for _ in range(TIMES_TO_REPEAT):
point = random.randint(0,LENGTH), random.randint(0,LENGTH)
if in_circle(*point):
inside_count += 1
pi = (inside_count / TIMES_TO_REPEAT) * 4
print(pi)
A few other remarks:
i
is unused; you can call the iteration variable_
.count
will eventually beTIMES_TO_REPEAT
, so you don't have to derive it by incrementing.*point
spares you from having to unpackpoint[0]
andpoint[1]
.- If you require points to be strictly inside the circle (comparing using < rather than ≤), it seems fair to use 0 instead of 1 as the lower bound.
However, it seems that using floating-point numbers increases accuracy more efficiently than longer integer values of LENGTH
. So, we can use the unit circle, and get rid of one arbitrary constant.
TIMES_TO_REPEAT = 5000000
def in_circle(x, y):
return x**2 + y**2 < 1
inside_count = 0
for _ in range(TIMES_TO_REPEAT):
if in_circle(random.random(), random.random()):
inside_count += 1
pi = (inside_count / TIMES_TO_REPEAT) * 4
-
2\$\begingroup\$ Solid advice overall (+1). The last line relies on the python3 float division. Would you mind from future import division. \$\endgroup\$ev-br– ev-br2014年11月10日 13:25:25 +00:00Commented Nov 10, 2014 at 13:25
Some minor issues:
inside_count / count
is the integer division in Python 2. You need to callfrom __future__ import division
to be correct in both versions of Python or have a note in the README pointing that out to the potential user.There is code that can be simplified;
You should try to follow PEP8.
Below is a simplification of your code taking into account the above issues.
"""
[Your explanation]
"""
from __future__ import division
import random
def in_circle(x, y):
return x**2 + y**2 < 1
def compute_pi(samples):
"""
Returns an approximation of pi using Monte Carlo.
The precision increases with sqrt(samples). See ref. [...] why.
"""
counts = 0
for _ in range(samples):
if in_circle(random.random(), random.random()):
counts += 1
return (counts / samples) * 4 # Expression in file docstring
if __name__ == '__main__':
print(compute_pi(1000))
Two things I can suggest:
1. To simplify code, have a unit circle centred at origin.
2. Python supports complex
data type. This is suitable for this problem. With this you can use abs()
function.
My solution:
num_points = 10**5
in_count = 0
for _ in range(num_points):
point = random.uniform(-1,1) + random.uniform(-1,1)*1j
if abs(point)<1: in_count += 1
print(in_count*4/num_points)
One liner (which may be confusing for beginners):
print(sum(1 for _ in range(num_points) if abs(random.uniform(-1,1) + random.uniform(-1,1)*1j) < 1)*4/num_points)
Explore related questions
See similar questions with these tags.