Monte Carlo pi calculation

Question 1

In order to learn the basics of Monte Carlo I calculated pi with it. I also wrote an explanation of the reasoning behind the code.

Down here you can see the circle with random points that I simulated in my code.

enter image description here

"""
 This programme calculates pi with Monte Carlo
Given a square and a circle inside it.
We have
Area_of_the_square = LENGTH ** 2
Area_of_the_circle = radius ** 2 * pi => (LENGTH ** 2) / 4 * pi
The circle is obviously smaller than the square.
We have the equation:
Area_of_the_square * number_less_than_one == Area_of_the_circle
This programme is going to put a big number of points inside the square
(I suggest TIMES_TO_REPEAT = 10**5).
It will then count how many of them are inside the circle,
number_less_than_one = points_inside_the_circle / total_points
After doing some simple math with this formula:
Area_of_the_square * number_less_than_one == Area_of_the_circle
we get that
pi = number_less_than_one * 4
NOTE: This method is deadly slow and quite complicated,
I made this programme just in order to learn.
"""
import random
TIMES_TO_REPEAT = 10**5
LENGTH = 10**5
CENTER = [LENGTH/2,LENGTH/2]
def in_circle(point):
 x = point[0]
 y = point[1]
 center_x = CENTER[0]
 center_y = CENTER[1]
 radius = LENGTH/2
 return (x - center_x)**2 + (y - center_y)**2 < radius**2
count = inside_count = 0
for i in range(TIMES_TO_REPEAT):
 point = random.randint(1,LENGTH),random.randint(1,LENGTH)
 if in_circle(point):
 inside_count += 1
 count += 1
pi = (inside_count / count) * 4
print(pi)

Question 2

Your code seems to be working. Here are a few things to make it better.

Tuple unpacking

x = point[0]
y = point[1]

can be written :

x, y = point

Remove what is not required

You maintain a count variable but you already know the count : it is TIMES_TO_REPEAT.

Variable name

TIMES_TO_REPEAT does not seem to be such a good name. NB_POINTS would probably be better.

Removing the counting logic

Using iterators and sum, you can remove the logic to count points :

inside_count = sum(1 for i in range(NB_POINTS) if in_circle((random.randint(1,LENGTH),random.randint(1,LENGTH))))

Underscore for throwaway values

In your iteration, you don't really care about the value of i. It is quite a common thing to call it _ (more about this).

Move things to a function

It makes things easier to test

def compute_pi(nb_it):
 inside_count = sum(1 for _ in range(nb_it) if in_circle((random.randint(1,LENGTH),random.randint(1,LENGTH))))
 return (inside_count / nb_it) * 4

Add an if main guard

It is quite common to define functions/values/classes/whatever from your python file but to use them only behind an if main guard to make things easier to reuse via module imports for instance.

Oops

Currently, changing the value of LENGTH seems to be changing the result. This is probably wrong.

At this point, my code looks like :

import random
NB_POINTS = 10**5
LENGTH = 10**5
CENTER = [LENGTH/2,LENGTH/2]
def in_circle(point):
 x,y = point
 center_x, center_y = CENTER
 radius = LENGTH/2
 return (x - center_x)**2 + (y - center_y)**2 < radius**2
def compute_pi(nb_it):
 inside_count = sum(1 for _ in range(nb_it) if in_circle((random.randint(1,LENGTH),random.randint(1,LENGTH))))
 return (inside_count / nb_it) * 4
if __name__ == "__main__":
 print(compute_pi(NB_POINTS))

Using the right type and the right values

In order to solve the problem above and make things easier, you could change a bit the referential you are using.

Instead of using a circle of radius R (equal to length/2) and with center (R, R) and picking integer values in square [1, R] * [1, R] maybe you could consider a circle of radius R and with center 0 and pick floating values in [-R, +R]. Even better, you could focus on the corner of the circle by considering points in [0, R]*[0, R]. Finally, R is not really important here, you could pick 1.

Then the code becomes :

import random
NB_POINTS = 10**5
def in_circle(x, y):
 return x**2 + y**2 < 1
def compute_pi(nb_it):
 inside_count = sum(1 for _ in range(nb_it) if in_circle(random.random(),random.random()))
 return (inside_count / nb_it) * 4
if __name__ == "__main__":
 print(compute_pi(NB_POINTS))

And then, this can be simplified.

import random
NB_POINTS = 10**5
def compute_pi(nb_it):
 return 4 * sum(1 for _ in range(nb_it) if random.random()**2 + random.random()**2 < 1) / nb_it
if __name__ == "__main__":
 print(compute_pi(NB_POINTS))

Question 3

In terms of randomness, is random.random() evenly distributed? If not, seeing as random.random()**2 is always less than or equal to 1, would it not be considered the same as random.random()?

Question 4

@Calpratt random() is uniformly distributed between 0 and 1, but random()**2 is not.

Question 5

Also: how is changing LENGTH changing the result? This is to be expected, to some extent, due to statistical fluctuations.

Question 6

Would it be somehow "bad" to, instead of moving everything to one line, instead write def in_circle: random.random()**2 + random.random()**2 < 1? I'm new to Python myself, but I've seen a lot of emphasis on doing "one thing at a time," whereas it seems like that one line in compute_pi is doing a lot of work, but it makes sense (at least to me) to generate the random numbers and check for their location in the same step. And also the line is more than 79 characters (as per PEP8).

Question 7

@ssdecontrol : you are right. It would probably be better but to make this function more useful/meaningful, it should take a radius (or diameter) and the center as arguments and this would bring back the tedious details i tried to remove. As for PEP 8, i am guilty of breaking the rule of 80 characters way too often :-/.

Question 8

First, a comment on the implementation: note that picking a random number between 1 and 10**5 is really no different than picking a random number between 0 and 1; you scale everything by the same amount, so why bother? In fact, requiring the x and y to be integers (and not including 0) reduces the accuracy of your method.

Removing that factor allows the code to be simplified as follows:

import random
def approximate_pi(repeats=10**5):
 """Monte Carlo simulation to approximate pi.
 ... explanation goes here
 """
 inside = 0
 for _ in range(repeats):
 x, y = random.random(), random.random()
 if (((0.5 - x) ** 2) + ((0.5 - y) ** 2)) <= 0.25:
 inside += 1
 return (4 * inside) / repeats
if __name__ == '__main__':
 print approximate_pi()

Note that:

I have moved the explanation into the function docstring, rather than have it sit outside;
I have used the if __name__ == '__main__' guard to run the code, rather than have it sit directly in the module; and
I have made the number of repeats a parameter rather than a module-level constant.

If you did want to factor out inside_circle, I would make LENGTH a parameter of that function:

def inside_circle(point, side_length):
 """Return whether or not point is in the circle.
 Note:
 The circle has a radius of side_length / 2 and sits inside a 
 square with sides of length side_length. Assumes that the 
 square's bottom-left corner is at (0, 0), i.e. that the square 
 is entirely within Quadrant I.
 Args:
 point (tuple of int/float): The x, y coordinates of the point.
 side_length (int/float): The length of the sides of the square
 within which the circle sits.
 """
 ...
def approximate_pi(side_length, repeats=10**5):
 """Monte Carlo simulation to approximate pi."""
 ...
 if inside_circle((x, y), side_length):
 ...

This allows you to from whatever import inside_circle and use it elsewhere without having to worry about whether CENTER is defined.

Question 9

As you've already realized, computational loops in pure python are often unacceptably slow. For this sort of code you're very often better off using numpy. For example:

import numpy as np
def pi_mc(N, seed=1234):
 rndm = np.random.RandomState(seed)
 x = rndm.uniform(size=N)
 y = rndm.uniform(size=N)
 r = x*x + y*y
 return 4. * np.count_nonzero(r < 1) / N
if __name__ == "__main__":
 for N in [100, 1000, 10000, 100000, 1e6]:
 pp = pi_mc(N)
 print N, pp, pp - np.pi

EDIT: As requested, some timings:

In [9]: for N in [100, 1000, 10000, 100000, int(1e6)]:
 %timeit pi_mc(N)
 ...: 
100000 loops, best of 3: 14.5 μs per loop
10000 loops, best of 3: 45.9 μs per loop
1000 loops, best of 3: 351 μs per loop
100 loops, best of 3: 3.36 ms per loop
10 loops, best of 3: 46.4 ms per loop
In [10]: for N in [100, 1000, 10000, 100000, int(1e6)]:
 %timeit pure_python_pi_mc(N)
 ....: 
10000 loops, best of 3: 61.5 μs per loop
1000 loops, best of 3: 548 μs per loop
100 loops, best of 3: 5.47 ms per loop
10 loops, best of 3: 54.9 ms per loop
1 loops, best of 3: 546 ms per loop

Here pure_python_pi_mc is the code from the answer by @200_success, wrapped into a function.

As you can see, relative speed of numpy improves with the number of iterations. This is due to the fact that setting up a loop takes constant time overhead.

Two more notes about numerical side of things:

always seed your random number generator (reproducibility)
better not relying on python2 vs python3 specific division. Use from __future__ import division even on python 3.

Question 10

Welcome to Code Review! Would you mind including a comparison of timing results in your answer?

Question 11

@200_success done

Question 12

always seed your random number generator (reproducibility): but IMO you should not set the default seed to be a definite value! The default at least should be from time...

Question 13

first, you're losing reproducibility. Which is bad for testing/debugging :-). In production, seeding from time does not play well with multiprocessing [running multiple MC chains in parallel]

Question 14

Can you comment on r = x*x + y*y instead of r = x**2 + y**2? Is this a style thing or is one more optimized than the other?

Question 15

The calculation would be simplified if you centered the circle at the origin. Also, by symmetry, you only need to model one quadrant.

import random
TIMES_TO_REPEAT = 10**5
LENGTH = 10**5
def in_circle(x, y):
 return x**2 + y**2 < LENGTH**2
inside_count = 0
for _ in range(TIMES_TO_REPEAT):
 point = random.randint(0,LENGTH), random.randint(0,LENGTH)
 if in_circle(*point):
 inside_count += 1
pi = (inside_count / TIMES_TO_REPEAT) * 4
print(pi)

A few other remarks:

i is unused; you can call the iteration variable _.
count will eventually be TIMES_TO_REPEAT, so you don't have to derive it by incrementing.
*point spares you from having to unpack point[0] and point[1].
If you require points to be strictly inside the circle (comparing using < rather than ≤), it seems fair to use 0 instead of 1 as the lower bound.

However, it seems that using floating-point numbers increases accuracy more efficiently than longer integer values of LENGTH. So, we can use the unit circle, and get rid of one arbitrary constant.

TIMES_TO_REPEAT = 5000000
def in_circle(x, y):
 return x**2 + y**2 < 1
inside_count = 0
for _ in range(TIMES_TO_REPEAT):
 if in_circle(random.random(), random.random()):
 inside_count += 1
pi = (inside_count / TIMES_TO_REPEAT) * 4

Question 16

Solid advice overall (+1). The last line relies on the python3 float division. Would you mind from future import division.

Question 17

Some minor issues:

inside_count / count is the integer division in Python 2. You need to call from __future__ import division to be correct in both versions of Python or have a note in the README pointing that out to the potential user.
There is code that can be simplified;
You should try to follow PEP8.

Below is a simplification of your code taking into account the above issues.

"""
[Your explanation]
"""
from __future__ import division
import random
def in_circle(x, y):
 return x**2 + y**2 < 1
def compute_pi(samples):
 """
 Returns an approximation of pi using Monte Carlo.
 The precision increases with sqrt(samples). See ref. [...] why.
 """
 counts = 0
 for _ in range(samples):
 if in_circle(random.random(), random.random()):
 counts += 1
 return (counts / samples) * 4 # Expression in file docstring
if __name__ == '__main__':
 print(compute_pi(1000))

Question 18

Two things I can suggest: 1. To simplify code, have a unit circle centred at origin. 2. Python supports complex data type. This is suitable for this problem. With this you can use abs() function.

My solution:

num_points = 10**5
in_count = 0
for _ in range(num_points):
 point = random.uniform(-1,1) + random.uniform(-1,1)*1j
 if abs(point)<1: in_count += 1
print(in_count*4/num_points)

One liner (which may be confusing for beginners):

print(sum(1 for _ in range(num_points) if abs(random.uniform(-1,1) + random.uniform(-1,1)*1j) < 1)*4/num_points)

SylvainD SylvainD 29.7k1 gold badge49 silver badges93 bronze badges · Accepted Answer · 2014-11-10 11:02:42Z

Your code seems to be working. Here are a few things to make it better.

Tuple unpacking

x = point[0]
y = point[1]

can be written :

x, y = point

Remove what is not required

You maintain a count variable but you already know the count : it is TIMES_TO_REPEAT.

Variable name

TIMES_TO_REPEAT does not seem to be such a good name. NB_POINTS would probably be better.

Removing the counting logic

Using iterators and sum, you can remove the logic to count points :

inside_count = sum(1 for i in range(NB_POINTS) if in_circle((random.randint(1,LENGTH),random.randint(1,LENGTH))))

Underscore for throwaway values

In your iteration, you don't really care about the value of i. It is quite a common thing to call it _ (more about this).

Move things to a function

It makes things easier to test

def compute_pi(nb_it):
 inside_count = sum(1 for _ in range(nb_it) if in_circle((random.randint(1,LENGTH),random.randint(1,LENGTH))))
 return (inside_count / nb_it) * 4

Add an if main guard

It is quite common to define functions/values/classes/whatever from your python file but to use them only behind an if main guard to make things easier to reuse via module imports for instance.

Oops

Currently, changing the value of LENGTH seems to be changing the result. This is probably wrong.

At this point, my code looks like :

import random
NB_POINTS = 10**5
LENGTH = 10**5
CENTER = [LENGTH/2,LENGTH/2]
def in_circle(point):
 x,y = point
 center_x, center_y = CENTER
 radius = LENGTH/2
 return (x - center_x)**2 + (y - center_y)**2 < radius**2
def compute_pi(nb_it):
 inside_count = sum(1 for _ in range(nb_it) if in_circle((random.randint(1,LENGTH),random.randint(1,LENGTH))))
 return (inside_count / nb_it) * 4
if __name__ == "__main__":
 print(compute_pi(NB_POINTS))

Using the right type and the right values

In order to solve the problem above and make things easier, you could change a bit the referential you are using.

Instead of using a circle of radius R (equal to length/2) and with center (R, R) and picking integer values in square [1, R] * [1, R] maybe you could consider a circle of radius R and with center 0 and pick floating values in [-R, +R]. Even better, you could focus on the corner of the circle by considering points in [0, R]*[0, R]. Finally, R is not really important here, you could pick 1.

Then the code becomes :

import random
NB_POINTS = 10**5
def in_circle(x, y):
 return x**2 + y**2 < 1
def compute_pi(nb_it):
 inside_count = sum(1 for _ in range(nb_it) if in_circle(random.random(),random.random()))
 return (inside_count / nb_it) * 4
if __name__ == "__main__":
 print(compute_pi(NB_POINTS))

And then, this can be simplified.

import random
NB_POINTS = 10**5
def compute_pi(nb_it):
 return 4 * sum(1 for _ in range(nb_it) if random.random()**2 + random.random()**2 < 1) / nb_it
if __name__ == "__main__":
 print(compute_pi(NB_POINTS))

In terms of randomness, is random.random() evenly distributed? If not, seeing as random.random()**2 is always less than or equal to 1, would it not be considered the same as random.random()?
@Calpratt random() is uniformly distributed between 0 and 1, but random()**2 is not.
Also: how is changing LENGTH changing the result? This is to be expected, to some extent, due to statistical fluctuations.
Would it be somehow "bad" to, instead of moving everything to one line, instead write def in_circle: random.random()**2 + random.random()**2 < 1? I'm new to Python myself, but I've seen a lot of emphasis on doing "one thing at a time," whereas it seems like that one line in compute_pi is doing a lot of work, but it makes sense (at least to me) to generate the random numbers and check for their location in the same step. And also the line is more than 79 characters (as per PEP8).
@ssdecontrol : you are right. It would probably be better but to make this function more useful/meaningful, it should take a radius (or diameter) and the center as arguments and this would bring back the tedious details i tried to remove. As for PEP 8, i am guilty of breaking the rule of 80 characters way too often :-/.

Stack Exchange Network

Monte Carlo pi calculation

6 Answers 6

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Monte Carlo pi calculation

6 Answers 6

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions