I wrote this code to show how wrong I was about Random Number Generators.
Context:
From Comments to this answer to Regularity in the "Rusty Towel of Mutual understanding"
that isn't really a percentage of chance though, the chance that the number returned is under 10 and/or under 50 is not 10% or 50%. you actually will have a higher percentage of numbers in the middle of the range than at the extremes. and I wish I could reference something here, but I don't remember why/how I know that. – Malachi 3 hours ago
@Malachi - you are wrong with that statement. – rolfl♦ 3 hours ago
I don't have access to running Java Code, but I could create a random number generator with C# and see what happens.... – Malachi 3 hours ago
@Malachi and you'd be testing the effect of using C#'s RNG with java calls. Java's Random.nextInt(int n), returns 0-(n-1), pseudorandomly distributed. – Pimgd 2 hours ago
I agree it would be totally different, but aren't both RNG's based on a C function or a C++ function? wouldn't they be fairly similar? – Malachi 2 hours ago
Is there anything that I can do to make it more apparent how wrong that I am about the distribution of random numbers? Anything I can do to make this code cleaner?
Random rng = new Random();
Dictionary<int, UInt64> tallyCount = new Dictionary<int, UInt64>();
//Key is 1-100 and value is number of times it appears
for (int i = 1; i < 101; i++)
{
tallyCount.Add(i, 0);
}
UInt64 totalNumbers;
Console.WriteLine("How many numbers do you require?");
totalNumbers = Convert.ToUInt64(Console.ReadLine());
for (UInt64 i = 0; i < totalNumbers; i++)
{
int randomNumber = rng.Next(1, 101);
tallyCount[randomNumber] += 1;
}
foreach(KeyValuePair<int,ulong> kvp in tallyCount)
{
double percentageOfTotal = (double)kvp.Value / (double)totalNumbers;
Console.WriteLine("{0}: {1} --> Percentage: {2}", kvp.Key.ToString(), kvp.Value.ToString(), percentageOfTotal.ToString("P4"));
}
Console.ReadLine();
Output:
How many numbers do you require?
5000
1: 57 --> Percentage: 1.1400 %
2: 43 --> Percentage: 0.8600 %
3: 55 --> Percentage: 1.1000 %
4: 39 --> Percentage: 0.7800 %
5: 57 --> Percentage: 1.1400 %
6: 61 --> Percentage: 1.2200 %
7: 44 --> Percentage: 0.8800 %
8: 52 --> Percentage: 1.0400 %
9: 52 --> Percentage: 1.0400 %
10: 46 --> Percentage: 0.9200 %
11: 52 --> Percentage: 1.0400 %
12: 58 --> Percentage: 1.1600 %
13: 50 --> Percentage: 1.0000 %
14: 36 --> Percentage: 0.7200 %
15: 55 --> Percentage: 1.1000 %
16: 42 --> Percentage: 0.8400 %
17: 46 --> Percentage: 0.9200 %
18: 47 --> Percentage: 0.9400 %
19: 64 --> Percentage: 1.2800 %
20: 55 --> Percentage: 1.1000 %
21: 46 --> Percentage: 0.9200 %
22: 45 --> Percentage: 0.9000 %
23: 49 --> Percentage: 0.9800 %
24: 50 --> Percentage: 1.0000 %
25: 38 --> Percentage: 0.7600 %
26: 60 --> Percentage: 1.2000 %
27: 44 --> Percentage: 0.8800 %
28: 52 --> Percentage: 1.0400 %
29: 57 --> Percentage: 1.1400 %
30: 44 --> Percentage: 0.8800 %
31: 58 --> Percentage: 1.1600 %
32: 53 --> Percentage: 1.0600 %
33: 52 --> Percentage: 1.0400 %
34: 45 --> Percentage: 0.9000 %
35: 43 --> Percentage: 0.8600 %
36: 58 --> Percentage: 1.1600 %
37: 55 --> Percentage: 1.1000 %
38: 59 --> Percentage: 1.1800 %
39: 57 --> Percentage: 1.1400 %
40: 49 --> Percentage: 0.9800 %
41: 51 --> Percentage: 1.0200 %
42: 41 --> Percentage: 0.8200 %
43: 41 --> Percentage: 0.8200 %
44: 46 --> Percentage: 0.9200 %
45: 43 --> Percentage: 0.8600 %
46: 52 --> Percentage: 1.0400 %
47: 56 --> Percentage: 1.1200 %
48: 49 --> Percentage: 0.9800 %
49: 44 --> Percentage: 0.8800 %
50: 65 --> Percentage: 1.3000 %
51: 49 --> Percentage: 0.9800 %
52: 46 --> Percentage: 0.9200 %
53: 51 --> Percentage: 1.0200 %
54: 50 --> Percentage: 1.0000 %
55: 53 --> Percentage: 1.0600 %
56: 44 --> Percentage: 0.8800 %
57: 54 --> Percentage: 1.0800 %
58: 45 --> Percentage: 0.9000 %
59: 59 --> Percentage: 1.1800 %
60: 47 --> Percentage: 0.9400 %
61: 52 --> Percentage: 1.0400 %
62: 45 --> Percentage: 0.9000 %
63: 49 --> Percentage: 0.9800 %
64: 59 --> Percentage: 1.1800 %
65: 50 --> Percentage: 1.0000 %
66: 55 --> Percentage: 1.1000 %
67: 60 --> Percentage: 1.2000 %
68: 46 --> Percentage: 0.9200 %
69: 49 --> Percentage: 0.9800 %
70: 61 --> Percentage: 1.2200 %
71: 48 --> Percentage: 0.9600 %
72: 38 --> Percentage: 0.7600 %
73: 60 --> Percentage: 1.2000 %
74: 44 --> Percentage: 0.8800 %
75: 46 --> Percentage: 0.9200 %
76: 45 --> Percentage: 0.9000 %
77: 50 --> Percentage: 1.0000 %
78: 50 --> Percentage: 1.0000 %
79: 52 --> Percentage: 1.0400 %
80: 48 --> Percentage: 0.9600 %
81: 54 --> Percentage: 1.0800 %
82: 45 --> Percentage: 0.9000 %
83: 56 --> Percentage: 1.1200 %
84: 49 --> Percentage: 0.9800 %
85: 51 --> Percentage: 1.0200 %
86: 55 --> Percentage: 1.1000 %
87: 55 --> Percentage: 1.1000 %
88: 49 --> Percentage: 0.9800 %
89: 57 --> Percentage: 1.1400 %
90: 51 --> Percentage: 1.0200 %
91: 46 --> Percentage: 0.9200 %
92: 48 --> Percentage: 0.9600 %
93: 50 --> Percentage: 1.0000 %
94: 44 --> Percentage: 0.8800 %
95: 53 --> Percentage: 1.0600 %
96: 44 --> Percentage: 0.8800 %
97: 43 --> Percentage: 0.8600 %
98: 47 --> Percentage: 0.9400 %
99: 39 --> Percentage: 0.7800 %
100: 46 --> Percentage: 0.9200 %
3 Answers 3
You've implemented a program to produce a histogram (though in tabular rather than visual form). That lets you judge "by eyeball" whether the distribution looks uniform. You would expect each bin to contain 1% of the samples. But how much deviation is allowable before you lose confidence?
Since you used the word "proof" in your question, I feel compelled to mention that eyeballing would not be good enough proof in academic terms. Statisticians actually have quantitative tests (Pearson's Chi-Squared test) to answer these questions. (Such tests are frequently used in medical publications, for example.)
The hypothesis is: "These outputs of rng.Next(1, 101)
came from a discrete uniform distribution." You would start by calculating \$\chi^2\$. Then, you look up the \$\chi^2\$ value in the table for totalNumbers - 1
degrees of freedom. That gives you the probability that you have a uniformly distributed number generator.
That is Pearson's Chi-squared test. While it would be tricky to actually derive the table and thus automate the entire test, you could at least compute \$\chi^2\,ドル which is easy to do.
-
1\$\begingroup\$ most of this went over my head, are you talking about variance(standard deviation) from a baseline like if I pulled 1000 numbers and just said that there was 10 of each number between 1-100 pulled? \$\endgroup\$Malachi– Malachi2014年10月01日 20:39:07 +00:00Commented Oct 1, 2014 at 20:39
-
\$\begingroup\$ Additional to this answer: 1. Check the NIST's monobits test. 2. Generate just a 0 or a 1 instead of 1..101. 3. Assuming there's a 50/50 chance for a 1 or a 0, the X^2 test gives you a probability that the distribution of your RNG is uniform. You'll want that as close to 1 as possible. 4. Indeed, a Python program giving such an answer is much better than your "look through the eye laces"-approach. \$\endgroup\$agtoever– agtoever2014年10月01日 21:16:48 +00:00Commented Oct 1, 2014 at 21:16
-
1\$\begingroup\$ It would be tricky to automate the test, but you can use a ready implementation from rosetta code ;) C version gives
dof: 99 distance: 75.2400 probability: 0.963911 uniform? Yes
for @Malachi's dataset from the question. \$\endgroup\$abuzittin gillifirca– abuzittin gillifirca2014年10月02日 12:22:33 +00:00Commented Oct 2, 2014 at 12:22 -
\$\begingroup\$ I need to upgrade this application to render a nice histogram(or several) so that I can show what happens when you start retrieving more and more "random" numbers. and give a statement about the results of the my research, and perhaps ask another question. \$\endgroup\$Malachi– Malachi2014年10月09日 19:07:01 +00:00Commented Oct 9, 2014 at 19:07
Using a dictionary here is complete overkill, a simple array would be fine.
It would also be easier to manage if you used random values from 0 100 instead of 1 to 101. Then you could simply index it back to the array.
Random rng = new Random();
UInt64 totalNumbers = 500000;
UInt64[] tallyCount = new UInt64[100];
for (UInt64 i = 0; i < totalNumbers; i++)
{
tallyCount[rng.Next(tallyCount.Length)]++;
}
for(int i = 0; i < tallyCount.Length; i++)
{
double percentageOfTotal = (double)tallyCount[i] / (double)totalNumbers;
Console.WriteLine("{0}: {1} --> Percentage: {2}", i, tallyCount[i], percentageOfTotal.ToString("P4"));
}
The use of a Dictionary for such a simple operation is overkill....
See the Ideone here
-
\$\begingroup\$ agreed. I went with the range from 1-101 because my 100 value wasn't being populated with data, but that was on the RNG too. spot on. \$\endgroup\$Malachi– Malachi2014年10月01日 19:20:08 +00:00Commented Oct 1, 2014 at 19:20
Minor "improvements":
Use a named constant for your upper bound:
const int Maximum = 100;
Initialization of your dictionary can be done with Linq:
var tallyCount = Enumerable.Range(1, Maximum).ToDictionary(i => i, i => 0UL);
You can merge declaration and assignment:
Console.WriteLine("How many numbers do you require?"); UInt64 totalNumbers = Convert.ToUInt64(Console.ReadLine());
-
\$\begingroup\$ lol, #3 I was doing until I tried to get fancy by trying a tryparse but I messed up a bunch of logic and brain started to melt, so I reverted and forgot about the declaration/assignment merge. thanks \$\endgroup\$Malachi– Malachi2014年10月01日 19:00:27 +00:00Commented Oct 1, 2014 at 19:00
long
much more usable thanInt64
in the same way you useint
and notInt32
. \$\endgroup\$UInt64
andulong
are the same thing, I was coding it with the intent that I would be consistent with the typecasting, I am kind of geeky like this, but I likeUInt64
better thanulong
for some reason. I don't know thatlong
is more usable thanInt64
though, why do you phrase it like that @JeroenVannevel? \$\endgroup\$