Given an array of true/false values, what is the most efficient algorithm to select an index with a true value at random.
A sketch simple algorithm is
a <- the array
c <- 0
for i in a:
if a[i] is true: c++
e <- random number in (0, c-1)
j <- 0
for i in e:
while j is false: j++
return j
Can anyone come up with a faster algorithm? Maybe there is a way to only walk through the list once even if the number of true elements is not known at first?
2 Answers 2
Use the "pick a random element from an infinite list" algorithm.
Keep an index of your current pick, and also a count of how many true values you've seen.
When you see a true value, increment the count and then replace your pick with the current index with a probability of P=(1/count). (So you always pick the first one you find... then you might switch to the second one, with probability 1/2, then you might switch to the third one with probabilty 1/3 etc.)
This requires only one scan over the list and constant storage. (It does require you to work out a larger number of random numbers, however.) In particular, it doesn't ever require you to either buffer the list or go back to the start - so it can work on an unbounded input stream.
See this answer for a sample LINQ implementation of the simple "pick a random element" algorithm; it would just need minor tweaks.
4 Comments
Build a list with indexes that point to true values and select one of those at random. Requires O(n) for list traversal and one try for the random number.
ahere might not fit).