Inspired by this SO post, I'd like to generate random bitstrings of a length $n$, having at most $k$ bits set for some $k$. Bitstrings should be selected uniformly at random amongst all possible bitstrings. (The original question also asks them to be unique, but that's easily handled with e.g. a set to catch duplicates).
Currently, I have a few approaches, none of which is ideal:
- For large $k$ ($\approx n/2$), just perform rejection sampling across all bitstrings of length $k$.
- For very small $k$, we can explicitly calculate the distribution of 1-bit counts via the binomial coefficients, then randomly select a target count from this distribution. (Generating bitstrings with exactly k bits set is much easier)
- For general use, choose a sequence of length $n$ out of a pool of $n$ 0s and $k$ 1s. While efficient, this approach is biased in favour of 1-bit counts near $k/2$.
Is there an efficient and correct algorithm for selecting random bitstrings with up to $k$ bits set?
-
$\begingroup$ Partial answer: cs.stackexchange.com/questions/67664/… $\endgroup$Pseudonym– Pseudonym ♦2024年09月26日 06:55:25 +00:00Commented Sep 26, 2024 at 6:55
-
$\begingroup$ @Pseudonym: as noted in my question, "generating bitstrings with exactly k bits set is much easier". $\endgroup$nneonneo– nneonneo2024年09月27日 01:20:02 +00:00Commented Sep 27, 2024 at 1:20
1 Answer 1
Yes. Let $f(n,k)$ denote the number of such bitstrings (i.e., number of $n$-bit strings containing $\le k$ ones). Then it is easy to sample from this set. In particular, use the following recursive algorithm:
Sample the first bit of such a string by choosing 1 with probability $p$ and 0 with probability 1ドル-p$, where $p=f(n-1,k-1)/f(n,k)$. If the first bit is 1, pick the remaining $n-1$ bits by recursively sampling from $n-1$-bit strings containing $\le k-1$ ones. If the first bit is 0, pick the remaining $n-1$ bits by recursively sampling from $n-1$-bit strings containing $\le k$ ones.
So all that remains is to describe an algorithm to compute $f(n,k)$. And this is easy to compute using dynamic programming, by taking advantage of the recurrence
$$f(n,k) = f(n-1,k-1) + f(n-1,k),$$
with base cases $f(n,0)=1$.
-
3$\begingroup$ I think OP knows this algorithm (their second bullet point), but it really isn't efficient - it needs bignum arithmetic and potentially many random bits per output bit. It's better for exactly $k$ set bits, because then $p=k/n$. $\endgroup$benrg– benrg2024年09月26日 20:38:16 +00:00Commented Sep 26, 2024 at 20:38