Random selection

Question 1

Given two integer numbers N and n (N>= n> 0), how do I generate random selection (without repetition!) of [0, N) with length = n? E.g. Given N = 5, n = 3 possible solutions are (3,0,2) or (2,4,1), etc.

There is a restriction that prevents of using naive approach: memory usage must be O(n), not O(N).

/* Under naive approach I mean usage of temporary array of size=N that's initially filled in with numbers 0..N-1 in order. Required n items are selected randomly from this array. */

Question 2

You should probably make the title more informative. "Random selection" doesn't say much.

Question 3

Did you try searching for past questions on this?

Question 4

possible duplicate of Algorithm to select a single, random combination of values?

Question 5

Go through all the numbers m from 0 to N, deciding whether to include m in the set as encountered. You need to update the probability of including the next number based on the numbers already treated.

Let's apply this idea to the example given, with n=3 and N=5. First consider m=0. There are 3 numbers remaining, and 5 possibilities, so 0 is in the set with probability 3/5. Use a random number generator to decide to include the number or not. Now consider m=1. If you included 0 in the set, then you have 2 numbers remaining and 4 possibilities, so it should be included with probability 2/4, but if 0 is not included, you have 3 numbers remaining and 4 possibilities and thus 1 should be included with probability 3/4. This continues until the required 3 numbers are included in the set.

Here's an implementation in Python:

from __future__ import division
import random
def rand_set(n, N):
 nums_included=set()
 for m in range(N):
 prob = (n-len(nums_included)) / (N-m)
 if random.random() < prob:
 nums_included.add(m)
 return nums_included

You could (and probably should) add in a test to see when you've got enough numbers in your set, and break out of the loop early.

The numbers are stored in a set, which varies in size from 0 to n, so the storage used is O(n). Everything else uses constant space, so it's overall O(n).

EDIT Actually, you can go a little further with this approach, so that it takes constant space. In Python, just make a generator based on the above:

def rand_set_iter(n, N):
 num_remaining = n
 m = 0
 while num_remaining > 0:
 prob = num_remaining / (N-m)
 if random.random() < prob:
 num_remaining -= 1
 yield m
 m += 1

Here, I've gone ahead and used a while loop instead of the for loop. To store the results, you'll of course need to use O(n) space. But if all you need to do is iterate through the numbers, the generator version does it in O(1).

For a language without generators, you can roll your own generator, calling a function repeatedly and updating a static or global variable.

Question 6

Excellent. Definitely the way to go.

Question 7

This is called Reservoir Sampling, I believe.

Question 8

@Moron I never knew the name for the approach. Thanks!

Question 9

A Mary Poppins answer. Practically perfect in every way.

Question 10

The simple (but potentially very inefficient) solution is just to build a list by repeatedly picking a value in the desired range, and checking whether or not you've already picked it. This has an unbounded maximum time, because you could always end up accidentally picking something you've already picked.

I have a vague inkling of an O(n²) solution which in each iteration picks a value in the range [0, N - i) where i is the number of elements you've already got... and then mapping that new value onto the range [0, N) by going through the existing picked elements and adding 1 if you find you've already got a value less than or equal to the value you've picked. You'd need to think about it carefully, but that's effectively the approach I'd look into.

Question 11

Even your simple solution is pretty good. If n < N/2, then expected number of failures is never more than 2. +1!

Question 12

In python, this would be really easy:

selection = random.shuffle(range(N))[:n]

This is O(N) in memory, since the list of valid values is generated first and then shuffled in place, so it fails on your requirement :(

You could try something like this:

N = 5
n = 3
selection = set()
while len(selection) < n:
 selection += pick_random_int(0, N)

Which is essentially what Jon Skeet proposed. This will work well for n << N, but start to fail horribly with n close to N. In that case, though, the O(n) and O(N) memory solutions will converge anyway and your requirement is moot ;)

Question 13

Divide the interval [0,N] to n intervals. From each interval select a random number and then randomize the result. The problem is that in this situation the the distribution is not uniformed.

Michael J. Barber 25.2k9 gold badges71 silver badges92 bronze badges · Accepted Answer · 2011-03-24 09:39:19Z

Go through all the numbers m from 0 to N, deciding whether to include m in the set as encountered. You need to update the probability of including the next number based on the numbers already treated.

Let's apply this idea to the example given, with n=3 and N=5. First consider m=0. There are 3 numbers remaining, and 5 possibilities, so 0 is in the set with probability 3/5. Use a random number generator to decide to include the number or not. Now consider m=1. If you included 0 in the set, then you have 2 numbers remaining and 4 possibilities, so it should be included with probability 2/4, but if 0 is not included, you have 3 numbers remaining and 4 possibilities and thus 1 should be included with probability 3/4. This continues until the required 3 numbers are included in the set.

Here's an implementation in Python:

from __future__ import division
import random
def rand_set(n, N):
 nums_included=set()
 for m in range(N):
 prob = (n-len(nums_included)) / (N-m)
 if random.random() < prob:
 nums_included.add(m)
 return nums_included

You could (and probably should) add in a test to see when you've got enough numbers in your set, and break out of the loop early.

The numbers are stored in a set, which varies in size from 0 to n, so the storage used is O(n). Everything else uses constant space, so it's overall O(n).

EDIT Actually, you can go a little further with this approach, so that it takes constant space. In Python, just make a generator based on the above:

def rand_set_iter(n, N):
 num_remaining = n
 m = 0
 while num_remaining > 0:
 prob = num_remaining / (N-m)
 if random.random() < prob:
 num_remaining -= 1
 yield m
 m += 1

Here, I've gone ahead and used a while loop instead of the for loop. To store the results, you'll of course need to use O(n) space. But if all you need to do is iterate through the numbers, the generator version does it in O(1).

For a language without generators, you can roll your own generator, calling a function repeatedly and updating a static or global variable.

CollectivesTM on Stack Overflow

Random selection

4 Answers 4

4 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

4 Answers 4

4 Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related