Random changes in a list of binary numbers

Question 1

I have a list of N integers, e.g. intList=[1,7,3,1,5] with N=5 and all integers have a binary representation of lenght l, e.g. binList=["001","111","011","001","101"] with l=3. Now I want to add random flips from 1->0 & 0->1. Each of the l positions should flip with a probabiliy mu. (mu varies over many magnitudes & N is around 100-100000) This happens over many iterations. Currently my code is

@nb.njit()
for x in range(iterations):
 ...other code...
 for number in range(N):
 for position in range(l):
 if random.random() < mu:
 intList[number] = intList[number] ^ (1 << position)
 ...other code...

The numba decorator is necessary. Since this part takes most of the execution time I just want to make sure that it is already optimal. Thanks for any comment!

Question 2

What is the expected magnitude of n and l?

Question 3

Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers .

Question 4

You may post the benchmark as an answer. Updating the question based on feedbacks is prohibited on the forum.

Question 5

@GZ0 Where exactly does it say so? We even frequently request updating questions (usually for clarity).

Question 6

@HighwayJohn Some people with too much power apparently prefer not to read and think, so yeah, better post it as an answer. Might even gain some upvotes.

Question 7

It is absolutely not optimal for a small mu. It's generally more efficient to first generate how many bits you'd like to flip using a Binomial distribution, and then sample the indices. However, then we have to generate words with k set bits, which involves more expensive RNG calls. However when mu is very small and l is very large, this is the most efficient method.

But for l that fit inside a single computer word, we can be more efficient. Note that we can first solve the problem for a small w <= l (like 8 or 16), and then combine multiple copies of these w into a single larger integer.

Our main technique is to precompute an alias method table for all integers up to width w to sample them according to the number of bits set. This is actually feasible for small w (say, up to 16). So without further ado:

import numpy as np
from collections import deque
class VoseAliasMethod:
 # Vose's Alias Method as described at https://www.keithschwarz.com/darts-dice-coins/.
 def __init__(self, weights):
 pmf = weights / np.sum(weights)
 self.n = pmf.shape[0]
 self.prob = np.zeros(self.n, dtype=np.float64)
 self.alias = np.zeros(self.n, dtype=np.int64)
 p = pmf * self.n
 small = deque(np.nonzero(p < 1.0)[0])
 large = deque(np.nonzero(p >= 1.0)[0])
 while small and large:
 l = small.popleft()
 g = large.popleft()
 self.prob[l] = p[l]
 self.alias[l] = g
 p[g] = (p[g] + p[l]) - 1.0
 (small if p[g] < 1.0 else large).append(g)
 self.prob[small] = 1.0
 self.prob[large] = 1.0
 def sample(self, size):
 ri = np.random.randint(0, self.n, size=size)
 rx = np.random.uniform(size=size)
 return np.where(rx < self.prob[ri], ri, self.alias[ri])

And the sampling code itself, following the parameters from your question here:

w = 10; mu = 0.0001 # Or whatever parameters you wish.
popcount = np.array([bin(n).count("1") for n in range(2**w)])
pmf = np.exp(popcount*np.log(mu) + (w - popcount)*np.log(1 - mu))
sampler = VoseAliasMethod(pmf)
l = 10; n = 10000 # Or however many you'd want of length < 64.
words_needed = int(np.ceil(l / w))
intlist = np.zeros(n, dtype=np.int64)
for _ in range(10000):
 raw_samples = sampler.sample((n, words_needed))
 if l % w != 0: raw_samples[:,-1] >>= words_needed*w - l
 raw_samples <<= w*np.arange(words_needed)
 result = np.bitwise_or.reduce(raw_samples, axis=1)
 intlist ^= result

Compared to your implementation from your other question this runs in ~2.2 seconds as opposed to ~6.1 seconds for your numba implementation.

Question 8

Thanks! This looks really promising. I'll have a closer look later this day and then I'll try to understand what you have done.

Question 9

You could replace

 for position in range(l):
 if random.random() < mu:
 intList[number] = intList[number] ^ (1 << position)

with

 for bit in bits:
 if random() < mu:
 intList[number] ^= bit

after preparing bits = [1 << position for position in range(l)] before the show and importing the random function.

Don't know whether that helps when using numba, and you didn't provide benchmark code.

Question 10

l is a poor variable name. It doesn't convey any information about what it is and it is easy to confuse with the number 1. I'll use nbits.

If nbits is not too large, it might make sense to precompute a table of possible bit flips and their probablities. For a probability of mu that a bit flips, the probability that it doesn't flip is (1 - mu). The probability that no bits are flipped is (1 - mu)**nbits; that only 1 bit flips is mu*(1 - mu)**(nbits - 1); that two are flipped is (mu**2)*(1 - mu)**(nbits - 2); and so on. For each number of flipped bits, each pattern is equally likely.

For the sample problem above, nbits is 3, and there are 8 possible bit flips: [0b000, 0b001, 0b010, 0b011, 0b100, 0b101, 0b110, 0b111]. There is one possibility of no bits flipped which has a probability of (1 - mu)**3. There are 3 possibilities with 1 bit flipped; each with a probablility of (mu*(1 - mu)**2). For 2 bits, there are also 3 possibilities, each with a probability of (mu**2)*(1 - mu). Lastly, the probability that all bits are flipped is mu**3. So we get:

p = [(1-mu)**3, (mu*(1-mu)**2), ((mu**2)*(1-mu)), mu**3]
flips = [0b000, 0b001, 0b010, 0b011, 0b100, 0b101, 0b110, 0b111]
weights = [p[0], p[1], p[1], p[2], p[1], p[2], p[2], p[3]]

Obviously, for larger nbits, you would have a function that calculates the probabilities and weights.

Then use random.choices() to pick the bit flips based on the weights:

for number in range(N):
 intList[number] ^= random.choices(flips, weight=weights)[0]

According to the docs, it is a bit more efficient to use cumulative weights.

import itertools
cum_weights = list(itertools.accumulate(weights))

Note that flips is the same as range(8), so we can do:

for number in range(N):
 intList[number] ^= random.choices(range(2**nbits), cum_weights=cum_weights)[0]

Lastly, if N isn't too big, we can use the k parameter to pick all the flips for the entire list in one call.

for index, bits in enumerate(random.choices(range(2**nbits), weights=weights, k=N)):
 intList[index] ^= bits

It nbits is too large to make a table for all possible bit flips, make a smaller table. Then use bit-shifts and or-operations to get the required number of bits. For example, with a table for 8-bits, a 16-bit flip can be calculated like this:

hi, lo = random.choices(range(2**nbits), cum_weights=cum_weights, k=2)
flip = hi << 8 | lo

Question 11

"There are 3 possibilities with 1 bit flipped; each with a probablility of (mu*(1 - mu)**2)/3". This is incorrect. Each of them has a probablility of mu*(1 - mu)**2. Same with the 2-flipped-bit case.

Question 12

@GZ0, Thanks and fixed.

Question 13

Some minor improvements for efficiency: (1) using a list comprehension intList = [v ^ flipped for v, flipped in zip(intList, choices(..., k=N))]; (2) in the computation of p, direct multiplication for small integer powers is more efficient than **, which needs to deal with arbitrary floating-point powers; (3) 1-mu does not need to be computed multiple times.

Question 14

Suggestion for computing the weights: weights = [1] and then for _ in range(nbits): weights = [w * p for p in (1-mu, mu) for w in weights].

orlp orlp 3911 silver badge8 bronze badges · Accepted Answer · 2020-11-19 02:11:41Z

It is absolutely not optimal for a small mu. It's generally more efficient to first generate how many bits you'd like to flip using a Binomial distribution, and then sample the indices. However, then we have to generate words with k set bits, which involves more expensive RNG calls. However when mu is very small and l is very large, this is the most efficient method.

But for l that fit inside a single computer word, we can be more efficient. Note that we can first solve the problem for a small w <= l (like 8 or 16), and then combine multiple copies of these w into a single larger integer.

Our main technique is to precompute an alias method table for all integers up to width w to sample them according to the number of bits set. This is actually feasible for small w (say, up to 16). So without further ado:

import numpy as np
from collections import deque
class VoseAliasMethod:
 # Vose's Alias Method as described at https://www.keithschwarz.com/darts-dice-coins/.
 def __init__(self, weights):
 pmf = weights / np.sum(weights)
 self.n = pmf.shape[0]
 self.prob = np.zeros(self.n, dtype=np.float64)
 self.alias = np.zeros(self.n, dtype=np.int64)
 p = pmf * self.n
 small = deque(np.nonzero(p < 1.0)[0])
 large = deque(np.nonzero(p >= 1.0)[0])
 while small and large:
 l = small.popleft()
 g = large.popleft()
 self.prob[l] = p[l]
 self.alias[l] = g
 p[g] = (p[g] + p[l]) - 1.0
 (small if p[g] < 1.0 else large).append(g)
 self.prob[small] = 1.0
 self.prob[large] = 1.0
 def sample(self, size):
 ri = np.random.randint(0, self.n, size=size)
 rx = np.random.uniform(size=size)
 return np.where(rx < self.prob[ri], ri, self.alias[ri])

And the sampling code itself, following the parameters from your question here:

w = 10; mu = 0.0001 # Or whatever parameters you wish.
popcount = np.array([bin(n).count("1") for n in range(2**w)])
pmf = np.exp(popcount*np.log(mu) + (w - popcount)*np.log(1 - mu))
sampler = VoseAliasMethod(pmf)
l = 10; n = 10000 # Or however many you'd want of length < 64.
words_needed = int(np.ceil(l / w))
intlist = np.zeros(n, dtype=np.int64)
for _ in range(10000):
 raw_samples = sampler.sample((n, words_needed))
 if l % w != 0: raw_samples[:,-1] >>= words_needed*w - l
 raw_samples <<= w*np.arange(words_needed)
 result = np.bitwise_or.reduce(raw_samples, axis=1)
 intlist ^= result

Compared to your implementation from your other question this runs in ~2.2 seconds as opposed to ~6.1 seconds for your numba implementation.

Thanks! This looks really promising. I'll have a closer look later this day and then I'll try to understand what you have done.

Stack Exchange Network

Random changes in a list of binary numbers

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Random changes in a list of binary numbers

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions