My Computer Science teacher gave us a task to calculate the average after one billion trials.
His exact assignment was this:
Consider generating a sequence of random numbers on the interval [0, 1) perhaps using Math.random() in a loop and adding them together. How many numbers would you expect to generate in such a sequence before the sum exceeds 1.0 ? (i.e. probability)
Write a program that simulates these trials a billion times and calculates the average length of the sequences. This is a neat exercise in nested loops.
Examples:
- 0.449 0.814
- length of sequence: 2
- 0.167 0.138 0.028 0.934
- length of sequence: 4
- 0.640 0.258 0.417
- length of sequence: 3
- 0.911 0.212
- length of sequence: 2
Average of the four lengths is 11/4 ≈ 2.75
What is the average of one billion random sequences?
My code was this:
import random
def genSequence():
storenums = 0
numTrials = 1000000000
for x in range(0,numTrials):
numberOfAttempts = 0
getToOne = 0
while (getToOne < 1): #keeps on generating random numbers and adding it to getToOne until it reaches 1 or is over 1
getToOne += random.random()
numberOfAttempts += 1
storenums = storenums + numberOfAttempts
#print (x)
#print(storenums)
calculateAverage(storenums,numTrials)
def calculateAverage(num,den):
average = num/den
print(average)
genSequence()
*Note: I am using repl.it to run my code so there is no main.
The problem with my code is that it can't reach 1 billion trials and stops working at around 227,035. I'm pretty sure this is a memory issue but I don't know how to fix this. What can I do so that it actually completes the billion trials and preferably not in an egregiously long amount of time.
EDIT: My teacher the result should be e, but that isn't the point as I just need to write the code. Getting e just means I did it right.
1 Answer 1
If I rewrite genSequence
so that it takes numTrials
as an argument, then I get the following timing in CPython:
Python 3.6.4 (default, Dec 21 2017, 20:33:21)
>>> from timeit import timeit
>>> timeit(lambda:genSequence(10**8), number=1)
2.71825759
62.77562193598715
Based on this, it would take about 10 minutes to compute genSequence(10**9)
. Possibly you just didn't wait long enough.
This kind of loop-heavy numerical code generally runs much faster if you use PyPy, which has a "just-in-time" compiler. I get more than ten times speedup with PyPy:
[PyPy 5.10.0 with GCC 4.2.1 Compatible Apple LLVM 9.0] on darwin
>>>> from timeit import timeit
>>>> timeit(lambda:genSequence(10**8), number=1)
2.71816679
5.389536142349243
On PyPy you should be able to carry out \10ドル^9\$ trials in under a minute (on my computer it takes 51 seconds).
Some review points:
The number
1000000000
is hard to read — it could easily be confused with100000000
or10000000000
. I would write10**9
to make it clear.There's no need for the variable
numberOfAttempts
; you could just add one tostorenums
on each loop.The name
storenums
is a bit vague. This is the total length of the random sequences generated so far, so a name liketotal_length
would be clearer.Similarly, the name
genSequence
is vague. This calculates the mean length of a random sequence, so a name likemean_sequence_length
would be clearer.The meaning of the constant 1 is not altogether clear. I would give it a name like
target_sum
.When a loop variable like
x
is not used, it's conventional to name it_
.range(0,numTrials)
can be writtenrange(numTrials)
.
Revised code:
import random
def mean_sequence_length(trials, target_sum=1.0):
"""Return mean length of random sequences adding up to at least
target_sum (carrying out the given number of trials).
"""
total_length = 0
for _ in range(trials):
current_sum = 0.0
while current_sum < target_sum:
current_sum += random.random()
total_length += 1
return total_length / trials
-
2\$\begingroup\$ Additionaly, instead of writing
10**9
, one could write1_000_000_000
. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2018年02月08日 16:13:57 +00:00Commented Feb 8, 2018 at 16:13 -
\$\begingroup\$ @Gareth Rees I've run it on PyPy and it certainly much faster with your edits but it is not running in under a minute like you said it should. Would that be a computer issue or more like some mistake on my part? \$\endgroup\$PGODULTIMATE– PGODULTIMATE2018年02月08日 23:49:57 +00:00Commented Feb 8, 2018 at 23:49
-
\$\begingroup\$ @PGODULTIMATE: Maybe my computer is faster than yours. Measure the performance on some smaller values of
trials
and extrapolate to work out how long it is going to take. \$\endgroup\$Gareth Rees– Gareth Rees2018年02月08日 23:59:07 +00:00Commented Feb 8, 2018 at 23:59 -
\$\begingroup\$ I measured preformance an it says 63.16667 minutes. \$\endgroup\$PGODULTIMATE– PGODULTIMATE2018年02月09日 17:18:20 +00:00Commented Feb 9, 2018 at 17:18
-
\$\begingroup\$ Actually I realized that I didn't comment out my print statement (I used it to check progress). Now the preformance time is about 1.252 minutes for 1 billion trials. Thank you very much! \$\endgroup\$PGODULTIMATE– PGODULTIMATE2018年02月09日 17:40:54 +00:00Commented Feb 9, 2018 at 17:40
Explore related questions
See similar questions with these tags.