How to generate a random number between 1 and N with an M sided die

Question 1

Is there a more general form for the answer to this question where a random number within any range can be generated from a source with any range, while preserving uniform distribution?

This question for example looks familiar and is changing a range of 1-5 to 1-7

Question 2

In general: you can use repeated throws of an $N$ sided die to generate a "decimal" in base $N$. Thus you approximate a uniform distribution on $[0,1]$ so you can certainly generate a random integer in any range you like. Of course, with any finite number of throws there is a non-zero chance that you'll fail but in the limit it works.

Question 3

So the result is any random positive integer, and the random number generator outputs a number with a range from 0 to any positive integer.

Question 4

Yes....but keep in mind that it might fail if you limit the number of throws. Say you try to generate 1,2,3ドル$ with two tosses of a fair coin. You can declare that $HH=1, HT=2,HT=3$ but if you throw $TT$ you fail.

Question 5

Yep in general I was thinking if the numbers don't share common factors, then the rejection sampling comes into the answer.

Question 6

The simplest way to proceed is rejection sampling. For this we need to first assume $M \geq N$. If this isn't the case (maybe you're playing D&D with a d6, so $N=20,M=6$), then you should roll at least $k=\lceil \log_M(N) \rceil$ times, and then label the $k$-tuples of rolls. For the discussion of the rejection approach let us assume you have already done this and accordingly rename $M^k$ as $M$ if need be.

Now rejection sampling amounts to assigning each of the numbers 1,2,ドル\dots,N$ to $q(M,N)$ of the possible rolls, where $q$ is the quotient when $M$ is divided by $N$. If you got one of the assigned rolls, you terminate, otherwise you start over.

This rejection method takes constant space. It fails with probability $\frac{r(M,N)}{M}$ (where $r$ is the remainder when $M$ is divided by $N$), so the average number of steps taken is $\frac{M}{M-r(M,N)}$. In theory this can run forever but the probability of long runtimes decays exponentially fast.

A different way to proceed is to set yourself up to use a dM to generate a random number from any discrete distribution and then apply it to this setting. To do that, note that if $X_n$ are iid rolls from a $M$-sided die then $U=\sum_{n=1}^\infty (X_n-1) M^{-n}$ is uniformly distributed on $[0,1]$. Then given whatever other univariate random variable $X$ you like, if $Q_X$ is its quantile function then $Q_X(U)$ has the same distribution as $X$. This technique is called the probability integral transformation.

Now at first glance this seems silly, because you can't do infinitely many rolls. But because $Q_X$ for discrete $X$ is piecewise constant, you don't need to exactly know what $U$ is. For simulating a dN, you only need to resolve which of the intervals $(k/N,(k+1)/N]$ that $U$ will eventually be in. Given $R$ rolls and a current value of $U$, say $U_R$, you know that the final value of $U$ will be somewhere between $U_R$ and $U_R+\sum_{n=R+1}^\infty (M-1) M^{-n}=U_R+M^{-R}$. If these numbers fall in the same interval of the form above then you are done computing.

Again the runtime of this alternative method is random and not bounded. The space footprint of this method is also random and not bounded. The advantage of it is that it can sometimes conserve prior information instead of just starting over, so that the probability that you will finish in the next step sometimes improves as you go on. Also, in the setting with $M<N$, although you need to roll at least $k=\lceil \log_M(N) \rceil$ times, you do not have to perform a multiple of $k$ rolls, which could be good if for some reason $k$ were large.

Let's do a demonstration of the method with $N=20,M=6$. I roll a 4, putting me in $[3/6,4/6]$, but $\lceil 3 \cdot 20/6 \rceil = 10$ while $\lceil 4 \cdot 20/6 \rceil=14$, so I have only narrowed it down to 5 possibilities so far. Then I roll a 6, putting me in $[23/36,24/36]$. I'm still not done because $\lceil 23 \cdot 20/36 \rceil = 13$ while $ \lceil 24 \cdot 20/36 \rceil=14$, so I have narrowed it to 13 or 14 but it isn't yet clear which. I roll again and get a 1ドル$, and now my roll is understood as a 13.

This "pseudo-probability integral transformation method" does not always succeed at retaining entropy at every step. Returning to this $N=20,M=6$ example, when you expand it all out, the algorithm does two rolls and assigns 20 of the 36 possibilities an outcome number. At this point each outcome needs an additional $(4/5) (1/36)$ unconditional probability. So we use the subsequent rolls to break up the other sixteen possibilities for the first two rolls between the 20 possible d20 rolls. This goes as follows:

80/20 (finish one number, leave one short by $(3/5) (1/36)$)
60/40 (finish the number you just started, leave one short by $(2/5) (1/36)$)
40/60 (finish the number you just started, leave one short by $(1/5) (1/36)$)
20/80 (finish the number you just started and the next one)

and then you repeat that pattern four times. There is no way to implement these 80/20 or 60/40 splits with a d6 while conserving entropy, because the number 3ドル/5$ (resp. 4/5) splits $[3/6,4/6]$ (resp. $[4/6,5/6]$) in the same proportion as each splits $[0,1]$ itself. You can think of that as coming about because 1ドル/5=0.\overline{1}$ in base 6.

Question 7

I don't understand how this doesn't throw away entropy, if M and N do not share common factors I thought the answer usually depends on rejection sampling (re rolling) and then there is entropy thrown away.

Question 8

@daniel The first version throws away more and more entropy each time it fails. The second one does not, because it does not truly fail at all, but keeps the information that was obtained so far.

Question 9

I don't think the second (interval) method has an advantage in regards to entropy compared to the first(power/rejection) method ( - If these two methods have better names let me know). Ex. looking at rolling a 2 sided die to get 1 out of 3, after 4 dice rolls the first method has (1/16) chance of needing a further 2 rolls, the second method has (3/16) chance of needing a further re roll. Also the rejection method might have the option to return part of the rejected number back to the entropy pool.

Question 10

If there was a good name for method 1 and 2 I'd accept this as the answer, as the other answers are equivalent to method 1.

Question 11

@daniel Well, method 1 is a kind of rejection method. Method 2 is a way of using the $M$ sided die as an entropy source to implement the probability integral transformation.

Question 12

A general strategy for obtaining a uniform discrete distribution on 1ドル$ to $N$ using an $M$-sided die is to find the least power $r$ such that $M^r\ge N,ドル then roll the die $r$ times. Assign radix $M$ place-values 0ドル$ to $M-1$ to each face of the die, and compute the radix $M$ value:

$$ m_r m_{r-1} \ldots m_1 $$

using the place-values generated by the rolls. Add 1ドル$ to the result, and if it is not more than $N,ドル accept that as the answer. Otherwise repeat the process until eventually a value between 1ドル$ and $N$ is obtained.

Question 13

I wrote a Python implementation a while back that did this:

def uniform_generator(m, n): #mimics m-sided die using an n-sided die
 """
 Expected number of rolls
 E = r * n^r / m
 where r = int(ceil(log(m, n)))
 """
 r = int(ceil(log(m, n)))
 while True:
 candidate = sum(n**power * randint(0, n-1) for power in range(r)) + 1
 if candidate <= m:
 return candidate

Not optimized or anything, but gets the job done.

How this works (using an $n$-sided die to mimic a $m$-sided die), starting off with two facts:

Generating a number from something like $\{1, 2, 3, ..., n\}$ is the same as generating a number from $\{0, 1, 2, ..., n-1\}$ and adding 1ドル$. We'll focus on this latter version.
If we had $m < n$ (using a bigger die to mimic a smaller one), we could simply continue rolling the $n$-sided die until we got something $\leq m$ and then just take that result. Let's say we wanted to know the probability of rolling a 1ドル$ using this strategy. There is a 1ドル/n$ chance we roll a 1ドル,ドル and a $(n-m)/n$ chance we roll too high and have to start again. This implies $p = 1/n + (n-m)p/n,ドル and we get $p = 1/m,ドル the same as if we had used a $m$-sided die to begin with. The point of this paragraph is to show that you can mimic a smaller uniform distribution by using a larger one and simply trying again if the result is too large.

Moving on: Instead of generating a number from $\{0, 1, 2, ..., n-1\}$ directly, we're actually building a base-$n$ number that is capable of returning any number in this set (and possibly a few more numbers, which we would ignore).

For instance, if you're familiar with binary (base-2ドル$), we could write the number 5ドル$ as $(1 \cdot 2^2) + (0 \cdot 2^1) + (1 \cdot 2^0)$. Each term in parentheses you can think of as a "bit column" -- we're generating each digit of the expansion independently using our $n$-sided die (any digit in base $n$ ranges from 0ドル$ to $n-1$).

In base $n,ドル the smallest number that can be generated is 0ドル,ドル and the largest is $n^r-1$ given $r$ columns. Each result is equally likely since each digit of that result is being generated independently.

We just need to ensure we have enough columns such that we can generate any number from 0ドル$ to $m-1,ドル i.e. where $n^r \geq m,ドル and this occurs at $r = \lceil \log_n(m) \rceil$. Then we use the $n$-sided die to generate a digit for each column and then add up the result from the expansion.

If the result plus 1ドル$ (see fact 1ドル$ from earlier) happens to be greater than $m,ドル we start over (see fact 2ドル$ from earlier).

Question 14

Was the output uniformly distributed? I think flooring or ceilinging increases the bias of some numbers.

Question 15

It's uniformly distributed. The $r$ term is just for determining the largest power of $n$ required.

Question 16

The questions you linked to have strategies that are easily generalized. If $M \gt N$ you can just roll the die, accept any number $\le M$ and roll again if the number is $\gt M$. This is very simple, but may lead to a lot of rolling if $M$ is rather larger than $N$. If $M \gt 2N$ you can take the largest multiple of $N$ that is less than $M$ and accept any roll up to that multiple, take it $\bmod N$ to get the result, and reroll anything larger.

You can just roll a number of times, add up the sum, and take that $\bmod N$. This will not be exactly even, but it will be very close.

Question 17

I forgot to include uniform distribution, I was thinking in terms of gambling and statistics so adding bias with the mod N is out

Question 18

and if If M<N you need to do, other methods.

Ian 105k5 gold badges101 silver badges171 bronze badges · Accepted Answer · 2017-04-24 16:12:30Z