Random bits with a specified density (Bernoulli Trials)

Question 1

I'm sure this is a fully 'solved' problem and expecting cross references but I can't find this on Google. I'm probably missing a keyword.

Here is a naive function for populating a bool array (b[i]) of length N in a Bernoulli distribution. P(b[i]==true)==p and P(b[i]==false)==(1-p) for each independent random variable b[i].

#include <stdlib.h>
#include <stdio.h>
#include <stdbool.h>
void randomize(bool*const bits,const size_t len,double p){
 for(size_t i=0;i<len;++i){
 if(p>=1.0||rand()<p*RAND_MAX){//Need to force the hand for p>=1.0 as 0 and RAND_MAX possible.
 bits[i]=true;
 }else{
 bits[i]=false;
 }
 }
}
int main(void) {
 const size_t len=100;//N in the text.
 bool* bits=malloc(len*sizeof(*bits));//b[i] in the text.
 if(bits==NULL){
 return EXIT_FAILURE;
 }
 srand(1234);//Fixed for Reproducibility.
 randomize(bits,len,0.3);
 int count=0;
 for(size_t i=0;i<len;++i){
 printf("%c" , (bits[i]?'1':'0'));
 if(bits[i]){
 ++count;
 }
 }
 printf("\ncount=%f\n",((double)count)/((double)len));
 free(bits);
 return EXIT_SUCCESS;
}

Obviously it's linear \$\mathcal{O}(N)\$ for 'seeks' in the array.

However I could start with an array full of (0) and calculate a random variable (\$B\$) in the binomial distribution to get the number of bits to set then select them at random.

If \$p\$ is small that will involve far fewer seeks. It will still be (expected) asymptotically linear but with a far reduced constant.

If I naively pick the \$B\$ elements to set I might pick the same one twice (Birthday Paradox anyone?) and will need to pick again. If I just 'scan' forwards to the first free slot I will introduce a 'clumping' bias and violate the independence of the random variables P(b[i+1]==1|b[i]==1)>p. Clumping is a no-no for my application.

Therein lies the rub. If \$p\$ is near to 1.0 it will take an increasingly massive amount of time to fill the array. When it's nearly full randomly picking a 0 will typically take a lot of retries.

I can be 'clever' and if \$B \gt \frac{N}{2}\$ I can block set to 1 and set \$N-B\$ back to 0. That means only counting 'seeks' I will always beat the full scan above. That's because the average retries at each step is \$\le 1 \$so expected seek count is \$ \lt N\$.

Does anyone have such a piece of code readily on hand?

PS: The data structure I'm working with isn't an array; it's a particular kind of tree. That's not important but might explain why I'm counting 'seeks' as the overriding parameter. However I am able to easily conjure up a 'tree of false' and a 'tree of true' to implement that 'block fill' requirement.

Question 2

@Jamal Thanks for the edits particularly the maths formatting. Beyond my knowledge. PS If I could be bothered I'd propose upvoting for edits on Meta Code Review.

Question 3

You're very kind. :-) I don't do it for the rep; I do it for site quality. This isn't the place to talk about it, though. Feel free to post such a proposal if you wish.

Question 4

For what I can see, the code fills an array with a given amounts of true values (the rest being false) in random order. I'd rather initialize first p*N elements to true, and then random_shuffle the array.

That said I don't understand why you need this array at all. Can we see how your application uses it?

Question 5

As to what I'm doing, I'll drop a hint. As stated I'm not working with an array but I think using my actual data structure may be distracting to this question. What I'm working with is a particular kind of tree in which each non-leaf node has exactly 4 child nodes. It may be that for higher values of p it's easier to recursively build the tree randomly from scratch. However it's obvious that for small values of p it's easier to follow my proposed Binomial approach and the 'critical' point for p is somewhere in the range most relevant to me (say 0.1 - 0.5).

Question 6

That said when p=0.5 there are numerous performance critical cryptographic applications in which N (bit-length) is non-trivial. In part that's why I'm surprised by a lack of ready answers. Though cryto is usually considering a packed array of bits - which doesn't massively apply to me though it could build a 'palette' of sub-trees and put them together. I lied a bit about it being a tree. It looks like a tree but duplicate sub-trees have been merged. Another clue.

Question 7

Random shuffle will require (2+p)*N visits on the average. 1 visit for each set and 2 visits for each swap. I'm also working in C - though obviously the algorithm is easy to implement!

vnp vnp 58.7k4 gold badges55 silver badges144 bronze badges · Answer 1 · 2015-01-05 06:47:50Z

2

\$\begingroup\$

For what I can see, the code fills an array with a given amounts of true values (the rest being false) in random order. I'd rather initialize first p*N elements to true, and then random_shuffle the array.

That said I don't understand why you need this array at all. Can we see how your application uses it?

Share

answered Jan 5, 2015 at 6:47

vnp's user avatar

vnp vnp

58.7k4 gold badges55 silver badges144 bronze badges

\$\endgroup\$

3

\$\begingroup\$ As to what I'm doing, I'll drop a hint. As stated I'm not working with an array but I think using my actual data structure may be distracting to this question. What I'm working with is a particular kind of tree in which each non-leaf node has exactly 4 child nodes. It may be that for higher values of p it's easier to recursively build the tree randomly from scratch. However it's obvious that for small values of p it's easier to follow my proposed Binomial approach and the 'critical' point for p is somewhere in the range most relevant to me (say 0.1 - 0.5). \$\endgroup\$

user59064
– user59064

2015年01月05日 07:40:50 +00:00
Commented Jan 5, 2015 at 7:40
\$\begingroup\$ That said when p=0.5 there are numerous performance critical cryptographic applications in which N (bit-length) is non-trivial. In part that's why I'm surprised by a lack of ready answers. Though cryto is usually considering a packed array of bits - which doesn't massively apply to me though it could build a 'palette' of sub-trees and put them together. I lied a bit about it being a tree. It looks like a tree but duplicate sub-trees have been merged. Another clue. \$\endgroup\$

user59064
– user59064

2015年01月05日 07:55:02 +00:00
Commented Jan 5, 2015 at 7:55
\$\begingroup\$ Random shuffle will require (2+p)*N visits on the average. 1 visit for each set and 2 visits for each swap. I'm also working in C - though obviously the algorithm is easy to implement! \$\endgroup\$

user59064
– user59064

2015年01月05日 08:33:35 +00:00
Commented Jan 5, 2015 at 8:33

Add a comment |

Stack Exchange Network

Random bits with a specified density (Bernoulli Trials)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Random bits with a specified density (Bernoulli Trials)

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions