Proof that I was wrong about Random Number Generators

Question 1

I wrote this code to show how wrong I was about Random Number Generators.

Context:

From Comments to this answer to Regularity in the "Rusty Towel of Mutual understanding"

_{that isn't really a percentage of chance though, the chance that the
number returned is under 10 and/or under 50 is not 10% or 50%. you
actually will have a higher percentage of numbers in the middle of the
range than at the extremes. and I wish I could reference something
here, but I don't remember why/how I know that. – Malachi 3 hours ago}

_{@Malachi - you are wrong with that statement. – rolfl♦ 3 hours ago}

_{I don't have access to running Java Code, but I could create a random number generator with C# and see what happens.... – Malachi 3 hours ago}

_{@Malachi and you'd be testing the effect of using C#'s RNG with java calls. Java's Random.nextInt(int n), returns 0-(n-1), pseudorandomly distributed. – Pimgd 2 hours ago}

_{I agree it would be totally different, but aren't both RNG's based on a C function or a C++ function? wouldn't they be fairly similar? – Malachi 2 hours ago}

Is there anything that I can do to make it more apparent how wrong that I am about the distribution of random numbers? Anything I can do to make this code cleaner?

Random rng = new Random();
Dictionary<int, UInt64> tallyCount = new Dictionary<int, UInt64>();
//Key is 1-100 and value is number of times it appears
for (int i = 1; i < 101; i++)
{
 tallyCount.Add(i, 0);
}
UInt64 totalNumbers;
Console.WriteLine("How many numbers do you require?");
totalNumbers = Convert.ToUInt64(Console.ReadLine()); 
for (UInt64 i = 0; i < totalNumbers; i++)
{
 int randomNumber = rng.Next(1, 101);
 tallyCount[randomNumber] += 1;
}
foreach(KeyValuePair<int,ulong> kvp in tallyCount)
{
 double percentageOfTotal = (double)kvp.Value / (double)totalNumbers;
 Console.WriteLine("{0}: {1} --> Percentage: {2}", kvp.Key.ToString(), kvp.Value.ToString(), percentageOfTotal.ToString("P4"));
}
Console.ReadLine();

Output:

How many numbers do you require?
5000
1: 57 --> Percentage: 1.1400 %
2: 43 --> Percentage: 0.8600 %
3: 55 --> Percentage: 1.1000 %
4: 39 --> Percentage: 0.7800 %
5: 57 --> Percentage: 1.1400 %
6: 61 --> Percentage: 1.2200 %
7: 44 --> Percentage: 0.8800 %
8: 52 --> Percentage: 1.0400 %
9: 52 --> Percentage: 1.0400 %
10: 46 --> Percentage: 0.9200 %
11: 52 --> Percentage: 1.0400 %
12: 58 --> Percentage: 1.1600 %
13: 50 --> Percentage: 1.0000 %
14: 36 --> Percentage: 0.7200 %
15: 55 --> Percentage: 1.1000 %
16: 42 --> Percentage: 0.8400 %
17: 46 --> Percentage: 0.9200 %
18: 47 --> Percentage: 0.9400 %
19: 64 --> Percentage: 1.2800 %
20: 55 --> Percentage: 1.1000 %
21: 46 --> Percentage: 0.9200 %
22: 45 --> Percentage: 0.9000 %
23: 49 --> Percentage: 0.9800 %
24: 50 --> Percentage: 1.0000 %
25: 38 --> Percentage: 0.7600 %
26: 60 --> Percentage: 1.2000 %
27: 44 --> Percentage: 0.8800 %
28: 52 --> Percentage: 1.0400 %
29: 57 --> Percentage: 1.1400 %
30: 44 --> Percentage: 0.8800 %
31: 58 --> Percentage: 1.1600 %
32: 53 --> Percentage: 1.0600 %
33: 52 --> Percentage: 1.0400 %
34: 45 --> Percentage: 0.9000 %
35: 43 --> Percentage: 0.8600 %
36: 58 --> Percentage: 1.1600 %
37: 55 --> Percentage: 1.1000 %
38: 59 --> Percentage: 1.1800 %
39: 57 --> Percentage: 1.1400 %
40: 49 --> Percentage: 0.9800 %
41: 51 --> Percentage: 1.0200 %
42: 41 --> Percentage: 0.8200 %
43: 41 --> Percentage: 0.8200 %
44: 46 --> Percentage: 0.9200 %
45: 43 --> Percentage: 0.8600 %
46: 52 --> Percentage: 1.0400 %
47: 56 --> Percentage: 1.1200 %
48: 49 --> Percentage: 0.9800 %
49: 44 --> Percentage: 0.8800 %
50: 65 --> Percentage: 1.3000 %
51: 49 --> Percentage: 0.9800 %
52: 46 --> Percentage: 0.9200 %
53: 51 --> Percentage: 1.0200 %
54: 50 --> Percentage: 1.0000 %
55: 53 --> Percentage: 1.0600 %
56: 44 --> Percentage: 0.8800 %
57: 54 --> Percentage: 1.0800 %
58: 45 --> Percentage: 0.9000 %
59: 59 --> Percentage: 1.1800 %
60: 47 --> Percentage: 0.9400 %
61: 52 --> Percentage: 1.0400 %
62: 45 --> Percentage: 0.9000 %
63: 49 --> Percentage: 0.9800 %
64: 59 --> Percentage: 1.1800 %
65: 50 --> Percentage: 1.0000 %
66: 55 --> Percentage: 1.1000 %
67: 60 --> Percentage: 1.2000 %
68: 46 --> Percentage: 0.9200 %
69: 49 --> Percentage: 0.9800 %
70: 61 --> Percentage: 1.2200 %
71: 48 --> Percentage: 0.9600 %
72: 38 --> Percentage: 0.7600 %
73: 60 --> Percentage: 1.2000 %
74: 44 --> Percentage: 0.8800 %
75: 46 --> Percentage: 0.9200 %
76: 45 --> Percentage: 0.9000 %
77: 50 --> Percentage: 1.0000 %
78: 50 --> Percentage: 1.0000 %
79: 52 --> Percentage: 1.0400 %
80: 48 --> Percentage: 0.9600 %
81: 54 --> Percentage: 1.0800 %
82: 45 --> Percentage: 0.9000 %
83: 56 --> Percentage: 1.1200 %
84: 49 --> Percentage: 0.9800 %
85: 51 --> Percentage: 1.0200 %
86: 55 --> Percentage: 1.1000 %
87: 55 --> Percentage: 1.1000 %
88: 49 --> Percentage: 0.9800 %
89: 57 --> Percentage: 1.1400 %
90: 51 --> Percentage: 1.0200 %
91: 46 --> Percentage: 0.9200 %
92: 48 --> Percentage: 0.9600 %
93: 50 --> Percentage: 1.0000 %
94: 44 --> Percentage: 0.8800 %
95: 53 --> Percentage: 1.0600 %
96: 44 --> Percentage: 0.8800 %
97: 43 --> Percentage: 0.8600 %
98: 47 --> Percentage: 0.9400 %
99: 39 --> Percentage: 0.7800 %
100: 46 --> Percentage: 0.9200 %

Question 2

Personally, I find long much more usable than Int64 in the same way you use int and not Int32.

Question 3

UInt64 and ulong are the same thing, I was coding it with the intent that I would be consistent with the typecasting, I am kind of geeky like this, but I like UInt64 better than ulong for some reason. I don't know that long is more usable than Int64 though, why do you phrase it like that @JeroenVannevel?

Question 4

I'm not sure what caused your initial misconception, but maybe you were getting confused with sums/averages of random numbers, which do indeed tend towards the middle of the range. i.e. If you roll one die, you're equally likely to get any number, but if you roll two, you're much more likely to get 7 than 12.

Question 5

"Is there anything that I can do to make it more apparent how wrong that I am about the distribution of random numbers?" Well, I think if you really wanted to there's some statistics we could use here. Specifically, if the output tended towards the middle of the range and away from the extremes, then the standard deviation would be lower than it ought to be. If you calculate the standard deviation of the simple list [1, 2, 3, ..., 100] and compare it to the standard deviation of the random sample you produce, your initial hypothesis would imply the std dev. of the sample would be lower.

Question 6

@BenAaronson Yeah, the theory behind that is called the central limit theorem.

Question 7

You've implemented a program to produce a histogram (though in tabular rather than visual form). That lets you judge "by eyeball" whether the distribution looks uniform. You would expect each bin to contain 1% of the samples. But how much deviation is allowable before you lose confidence?

Since you used the word "proof" in your question, I feel compelled to mention that eyeballing would not be good enough proof in academic terms. Statisticians actually have quantitative tests (Pearson's Chi-Squared test) to answer these questions. (Such tests are frequently used in medical publications, for example.)

The hypothesis is: "These outputs of rng.Next(1, 101) came from a discrete uniform distribution." You would start by calculating \$\chi^2\$. Then, you look up the \$\chi^2\$ value in the table for totalNumbers - 1 degrees of freedom. That gives you the probability that you have a uniformly distributed number generator.

That is Pearson's Chi-squared test. While it would be tricky to actually derive the table and thus automate the entire test, you could at least compute \$\chi^2\,ドル which is easy to do.

Question 8

most of this went over my head, are you talking about variance(standard deviation) from a baseline like if I pulled 1000 numbers and just said that there was 10 of each number between 1-100 pulled?

Question 9

Additional to this answer: 1. Check the NIST's monobits test. 2. Generate just a 0 or a 1 instead of 1..101. 3. Assuming there's a 50/50 chance for a 1 or a 0, the X^2 test gives you a probability that the distribution of your RNG is uniform. You'll want that as close to 1 as possible. 4. Indeed, a Python program giving such an answer is much better than your "look through the eye laces"-approach.

Question 10

It would be tricky to automate the test, but you can use a ready implementation from rosetta code ;) C version gives dof: 99 distance: 75.2400 probability: 0.963911 uniform? Yes for @Malachi's dataset from the question.

Question 11

I need to upgrade this application to render a nice histogram(or several) so that I can show what happens when you start retrieving more and more "random" numbers. and give a statement about the results of the my research, and perhaps ask another question.

Question 12

Using a dictionary here is complete overkill, a simple array would be fine.

It would also be easier to manage if you used random values from 0 100 instead of 1 to 101. Then you could simply index it back to the array.

Random rng = new Random();
UInt64 totalNumbers = 500000;
UInt64[] tallyCount = new UInt64[100];
for (UInt64 i = 0; i < totalNumbers; i++)
{
 tallyCount[rng.Next(tallyCount.Length)]++;
}
for(int i = 0; i < tallyCount.Length; i++)
{
 double percentageOfTotal = (double)tallyCount[i] / (double)totalNumbers;
 Console.WriteLine("{0}: {1} --> Percentage: {2}", i, tallyCount[i], percentageOfTotal.ToString("P4"));
}

The use of a Dictionary for such a simple operation is overkill....

See the Ideone here

Question 13

agreed. I went with the range from 1-101 because my 100 value wasn't being populated with data, but that was on the RNG too. spot on.

Question 14

Minor "improvements":

Use a named constant for your upper bound:
```
const int Maximum = 100;
```

Initialization of your dictionary can be done with Linq:

var tallyCount = Enumerable.Range(1, Maximum).ToDictionary(i => i, i => 0UL);

You can merge declaration and assignment:

Console.WriteLine("How many numbers do you require?");
UInt64 totalNumbers = Convert.ToUInt64(Console.ReadLine());

Question 15

lol, #3 I was doing until I tried to get fancy by trying a tryparse but I messed up a bunch of logic and brain started to melt, so I reverted and forgot about the declaration/assignment merge. thanks

200_success 200_success 146k22 gold badges190 silver badges479 bronze badges · Accepted Answer · 2014-10-01 20:27:03Z

You've implemented a program to produce a histogram (though in tabular rather than visual form). That lets you judge "by eyeball" whether the distribution looks uniform. You would expect each bin to contain 1% of the samples. But how much deviation is allowable before you lose confidence?

Since you used the word "proof" in your question, I feel compelled to mention that eyeballing would not be good enough proof in academic terms. Statisticians actually have quantitative tests (Pearson's Chi-Squared test) to answer these questions. (Such tests are frequently used in medical publications, for example.)

The hypothesis is: "These outputs of rng.Next(1, 101) came from a discrete uniform distribution." You would start by calculating \$\chi^2\$. Then, you look up the \$\chi^2\$ value in the table for totalNumbers - 1 degrees of freedom. That gives you the probability that you have a uniformly distributed number generator.

That is Pearson's Chi-squared test. While it would be tricky to actually derive the table and thus automate the entire test, you could at least compute \$\chi^2\,ドル which is easy to do.

most of this went over my head, are you talking about variance(standard deviation) from a baseline like if I pulled 1000 numbers and just said that there was 10 of each number between 1-100 pulled?
Additional to this answer: 1. Check the NIST's monobits test. 2. Generate just a 0 or a 1 instead of 1..101. 3. Assuming there's a 50/50 chance for a 1 or a 0, the X^2 test gives you a probability that the distribution of your RNG is uniform. You'll want that as close to 1 as possible. 4. Indeed, a Python program giving such an answer is much better than your "look through the eye laces"-approach.
It would be tricky to automate the test, but you can use a ready implementation from rosetta code ;) C version gives dof: 99 distance: 75.2400 probability: 0.963911 uniform? Yes for @Malachi's dataset from the question.
I need to upgrade this application to render a nice histogram(or several) so that I can show what happens when you start retrieving more and more "random" numbers. and give a statement about the results of the my research, and perhaps ask another question.

Stack Exchange Network

Proof that I was wrong about Random Number Generators

Context:

Output:

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Proof that I was wrong about Random Number Generators

Context:

Output:

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions