I have a list of N
integers, e.g. intList=[1,7,3,1,5]
with N=5
and all integers have a binary representation of lenght l
, e.g. binList=["001","111","011","001","101"]
with l=3
. Now I want to add random flips from 1->0 & 0->1
. Each of the l
positions should flip with a probabiliy mu
. (mu
varies over many magnitudes & N
is around 100-100000) This happens over many iterations. Currently my code is
@nb.njit()
for x in range(iterations):
...other code...
for number in range(N):
for position in range(l):
if random.random() < mu:
intList[number] = intList[number] ^ (1 << position)
...other code...
The numba decorator is necessary. Since this part takes most of the execution time I just want to make sure that it is already optimal. Thanks for any comment!
3 Answers 3
It is absolutely not optimal for a small mu
. It's generally more efficient to first generate how many bits you'd like to flip using a Binomial distribution, and then sample the indices. However, then we have to generate words with k
set bits, which involves more expensive RNG calls. However when mu
is very small and l
is very large, this is the most efficient method.
But for l
that fit inside a single computer word, we can be more efficient. Note that we can first solve the problem for a small w <= l
(like 8 or 16), and then combine multiple copies of these w
into a single larger integer.
Our main technique is to precompute an alias method table for all integers up to width w
to sample them according to the number of bits set. This is actually feasible for small w
(say, up to 16). So without further ado:
import numpy as np
from collections import deque
class VoseAliasMethod:
# Vose's Alias Method as described at https://www.keithschwarz.com/darts-dice-coins/.
def __init__(self, weights):
pmf = weights / np.sum(weights)
self.n = pmf.shape[0]
self.prob = np.zeros(self.n, dtype=np.float64)
self.alias = np.zeros(self.n, dtype=np.int64)
p = pmf * self.n
small = deque(np.nonzero(p < 1.0)[0])
large = deque(np.nonzero(p >= 1.0)[0])
while small and large:
l = small.popleft()
g = large.popleft()
self.prob[l] = p[l]
self.alias[l] = g
p[g] = (p[g] + p[l]) - 1.0
(small if p[g] < 1.0 else large).append(g)
self.prob[small] = 1.0
self.prob[large] = 1.0
def sample(self, size):
ri = np.random.randint(0, self.n, size=size)
rx = np.random.uniform(size=size)
return np.where(rx < self.prob[ri], ri, self.alias[ri])
And the sampling code itself, following the parameters from your question here:
w = 10; mu = 0.0001 # Or whatever parameters you wish.
popcount = np.array([bin(n).count("1") for n in range(2**w)])
pmf = np.exp(popcount*np.log(mu) + (w - popcount)*np.log(1 - mu))
sampler = VoseAliasMethod(pmf)
l = 10; n = 10000 # Or however many you'd want of length < 64.
words_needed = int(np.ceil(l / w))
intlist = np.zeros(n, dtype=np.int64)
for _ in range(10000):
raw_samples = sampler.sample((n, words_needed))
if l % w != 0: raw_samples[:,-1] >>= words_needed*w - l
raw_samples <<= w*np.arange(words_needed)
result = np.bitwise_or.reduce(raw_samples, axis=1)
intlist ^= result
Compared to your implementation from your other question this runs in ~2.2 seconds as opposed to ~6.1 seconds for your numba
implementation.
-
\$\begingroup\$ Thanks! This looks really promising. I'll have a closer look later this day and then I'll try to understand what you have done. \$\endgroup\$HighwayJohn– HighwayJohn2020年11月19日 14:05:41 +00:00Commented Nov 19, 2020 at 14:05
You could replace
for position in range(l):
if random.random() < mu:
intList[number] = intList[number] ^ (1 << position)
with
for bit in bits:
if random() < mu:
intList[number] ^= bit
after preparing bits = [1 << position for position in range(l)]
before the show and importing the random
function.
Don't know whether that helps when using numba, and you didn't provide benchmark code.
l
is a poor variable name. It doesn't convey any information about what it is and it is easy to confuse with the number 1
. I'll use nbits
.
If nbits
is not too large, it might make sense to precompute a table of possible bit flips and their probablities. For a probability of mu that a bit flips, the probability that it doesn't flip is (1 - mu)
. The probability that no bits are flipped is (1 - mu)**nbits
; that only 1 bit flips is mu*(1 - mu)**(nbits - 1)
; that two are flipped is (mu**2)*(1 - mu)**(nbits - 2)
; and so on. For each number of flipped bits, each pattern is equally likely.
For the sample problem above, nbits
is 3, and there are 8 possible bit flips: [0b000, 0b001, 0b010, 0b011, 0b100, 0b101, 0b110, 0b111]. There is one possibility of no bits flipped which has a probability of (1 - mu)**3
. There are 3 possibilities with 1 bit flipped; each with a probablility of (mu*(1 - mu)**2)
. For 2 bits, there are also 3 possibilities, each with a probability of (mu**2)*(1 - mu)
. Lastly, the probability that all bits are flipped is mu**3
. So we get:
p = [(1-mu)**3, (mu*(1-mu)**2), ((mu**2)*(1-mu)), mu**3]
flips = [0b000, 0b001, 0b010, 0b011, 0b100, 0b101, 0b110, 0b111]
weights = [p[0], p[1], p[1], p[2], p[1], p[2], p[2], p[3]]
Obviously, for larger nbits
, you would have a function that calculates the probabilities and weights.
Then use random.choices()
to pick the bit flips based on the weights:
for number in range(N):
intList[number] ^= random.choices(flips, weight=weights)[0]
According to the docs, it is a bit more efficient to use cumulative weights.
import itertools
cum_weights = list(itertools.accumulate(weights))
Note that flips
is the same as range(8)
, so we can do:
for number in range(N):
intList[number] ^= random.choices(range(2**nbits), cum_weights=cum_weights)[0]
Lastly, if N
isn't too big, we can use the k
parameter to pick all the flips for the entire list in one call.
for index, bits in enumerate(random.choices(range(2**nbits), weights=weights, k=N)):
intList[index] ^= bits
It nbits
is too large to make a table for all possible bit flips, make a smaller table. Then use bit-shifts and or-operations to get the required number of bits. For example, with a table for 8-bits, a 16-bit flip can be calculated like this:
hi, lo = random.choices(range(2**nbits), cum_weights=cum_weights, k=2)
flip = hi << 8 | lo
-
\$\begingroup\$ "There are 3 possibilities with 1 bit flipped; each with a probablility of
(mu*(1 - mu)**2)/3
". This is incorrect. Each of them has a probablility ofmu*(1 - mu)**2
. Same with the 2-flipped-bit case. \$\endgroup\$GZ0– GZ02020年11月15日 10:55:02 +00:00Commented Nov 15, 2020 at 10:55 -
\$\begingroup\$ @GZ0, Thanks and fixed. \$\endgroup\$RootTwo– RootTwo2020年11月15日 17:11:25 +00:00Commented Nov 15, 2020 at 17:11
-
\$\begingroup\$ Some minor improvements for efficiency: (1) using a list comprehension
intList = [v ^ flipped for v, flipped in zip(intList, choices(..., k=N))]
; (2) in the computation ofp
, direct multiplication for small integer powers is more efficient than**
, which needs to deal with arbitrary floating-point powers; (3)1-mu
does not need to be computed multiple times. \$\endgroup\$GZ0– GZ02020年11月15日 17:45:50 +00:00Commented Nov 15, 2020 at 17:45 -
\$\begingroup\$ Suggestion for computing the weights:
weights = [1]
and thenfor _ in range(nbits): weights = [w * p for p in (1-mu, mu) for w in weights]
. \$\endgroup\$superb rain– superb rain2020年11月15日 19:01:22 +00:00Commented Nov 15, 2020 at 19:01
n
andl
? \$\endgroup\$