Calculating the Average Minimum Length of Random Numbers From 0 - 1 that Adds up to Greater Than 1

Question 1

My Computer Science teacher gave us a task to calculate the average after one billion trials.

His exact assignment was this:

Consider generating a sequence of random numbers on the interval [0, 1) perhaps using Math.random() in a loop and adding them together. How many numbers would you expect to generate in such a sequence before the sum exceeds 1.0 ? (i.e. probability)

Write a program that simulates these trials a billion times and calculates the average length of the sequences. This is a neat exercise in nested loops.

Examples:

0.449 0.814

length of sequence: 2

0.167 0.138 0.028 0.934

length of sequence: 4

0.640 0.258 0.417

length of sequence: 3

0.911 0.212

length of sequence: 2

Average of the four lengths is 11/4 ≈ 2.75

What is the average of one billion random sequences?

My code was this:

import random
def genSequence():
 storenums = 0
 numTrials = 1000000000
 for x in range(0,numTrials):
 numberOfAttempts = 0
 getToOne = 0 
 while (getToOne < 1): #keeps on generating random numbers and adding it to getToOne until it reaches 1 or is over 1
 getToOne += random.random()
 numberOfAttempts += 1
 storenums = storenums + numberOfAttempts
 #print (x)
 #print(storenums)
 calculateAverage(storenums,numTrials)
def calculateAverage(num,den):
 average = num/den
 print(average)
genSequence()

*Note: I am using repl.it to run my code so there is no main.

The problem with my code is that it can't reach 1 billion trials and stops working at around 227,035. I'm pretty sure this is a memory issue but I don't know how to fix this. What can I do so that it actually completes the billion trials and preferably not in an egregiously long amount of time.

EDIT: My teacher the result should be e, but that isn't the point as I just need to write the code. Getting e just means I did it right.

Question 2

If I rewrite genSequence so that it takes numTrials as an argument, then I get the following timing in CPython:

Python 3.6.4 (default, Dec 21 2017, 20:33:21) 
>>> from timeit import timeit
>>> timeit(lambda:genSequence(10**8), number=1)
2.71825759
62.77562193598715

Based on this, it would take about 10 minutes to compute genSequence(10**9). Possibly you just didn't wait long enough.

This kind of loop-heavy numerical code generally runs much faster if you use PyPy, which has a "just-in-time" compiler. I get more than ten times speedup with PyPy:

[PyPy 5.10.0 with GCC 4.2.1 Compatible Apple LLVM 9.0] on darwin
>>>> from timeit import timeit
>>>> timeit(lambda:genSequence(10**8), number=1)
2.71816679
5.389536142349243

On PyPy you should be able to carry out \10ドル^9\$ trials in under a minute (on my computer it takes 51 seconds).

Some review points:

The number 1000000000 is hard to read — it could easily be confused with 100000000 or 10000000000. I would write 10**9 to make it clear.
There's no need for the variable numberOfAttempts; you could just add one to storenums on each loop.
The name storenums is a bit vague. This is the total length of the random sequences generated so far, so a name like total_length would be clearer.
Similarly, the name genSequence is vague. This calculates the mean length of a random sequence, so a name like mean_sequence_length would be clearer.
The meaning of the constant 1 is not altogether clear. I would give it a name like target_sum.
When a loop variable like x is not used, it's conventional to name it _.
range(0,numTrials) can be written range(numTrials).

Revised code:

import random
def mean_sequence_length(trials, target_sum=1.0):
 """Return mean length of random sequences adding up to at least
 target_sum (carrying out the given number of trials).
 """
 total_length = 0
 for _ in range(trials):
 current_sum = 0.0
 while current_sum < target_sum:
 current_sum += random.random()
 total_length += 1
 return total_length / trials

Question 3

Additionaly, instead of writing 10**9, one could write 1_000_000_000.

Question 4

@Gareth Rees I've run it on PyPy and it certainly much faster with your edits but it is not running in under a minute like you said it should. Would that be a computer issue or more like some mistake on my part?

Question 5

@PGODULTIMATE: Maybe my computer is faster than yours. Measure the performance on some smaller values of trials and extrapolate to work out how long it is going to take.

Question 6

I measured preformance an it says 63.16667 minutes.

Question 7

Actually I realized that I didn't comment out my print statement (I used it to check progress). Now the preformance time is about 1.252 minutes for 1 billion trials. Thank you very much!

Gareth Rees Gareth Rees 50.1k3 gold badges130 silver badges210 bronze badges · Accepted Answer · 2018-02-08 15:50:46Z

If I rewrite genSequence so that it takes numTrials as an argument, then I get the following timing in CPython:

Python 3.6.4 (default, Dec 21 2017, 20:33:21) 
>>> from timeit import timeit
>>> timeit(lambda:genSequence(10**8), number=1)
2.71825759
62.77562193598715

Based on this, it would take about 10 minutes to compute genSequence(10**9). Possibly you just didn't wait long enough.

This kind of loop-heavy numerical code generally runs much faster if you use PyPy, which has a "just-in-time" compiler. I get more than ten times speedup with PyPy:

[PyPy 5.10.0 with GCC 4.2.1 Compatible Apple LLVM 9.0] on darwin
>>>> from timeit import timeit
>>>> timeit(lambda:genSequence(10**8), number=1)
2.71816679
5.389536142349243

On PyPy you should be able to carry out \10ドル^9\$ trials in under a minute (on my computer it takes 51 seconds).

Some review points:

The number 1000000000 is hard to read — it could easily be confused with 100000000 or 10000000000. I would write 10**9 to make it clear.
There's no need for the variable numberOfAttempts; you could just add one to storenums on each loop.
The name storenums is a bit vague. This is the total length of the random sequences generated so far, so a name like total_length would be clearer.
Similarly, the name genSequence is vague. This calculates the mean length of a random sequence, so a name like mean_sequence_length would be clearer.
The meaning of the constant 1 is not altogether clear. I would give it a name like target_sum.
When a loop variable like x is not used, it's conventional to name it _.
range(0,numTrials) can be written range(numTrials).

Revised code:

import random
def mean_sequence_length(trials, target_sum=1.0):
 """Return mean length of random sequences adding up to at least
 target_sum (carrying out the given number of trials).
 """
 total_length = 0
 for _ in range(trials):
 current_sum = 0.0
 while current_sum < target_sum:
 current_sum += random.random()
 total_length += 1
 return total_length / trials

Additionaly, instead of writing 10**9, one could write 1_000_000_000.
@Gareth Rees I've run it on PyPy and it certainly much faster with your edits but it is not running in under a minute like you said it should. Would that be a computer issue or more like some mistake on my part?
@PGODULTIMATE: Maybe my computer is faster than yours. Measure the performance on some smaller values of trials and extrapolate to work out how long it is going to take.
Actually I realized that I didn't comment out my print statement (I used it to check progress). Now the preformance time is about 1.252 minutes for 1 billion trials. Thank you very much!

Stack Exchange Network

Calculating the Average Minimum Length of Random Numbers From 0 - 1 that Adds up to Greater Than 1

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Calculating the Average Minimum Length of Random Numbers From 0 - 1 that Adds up to Greater Than 1

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions