13

I have a case where I need to select a random item, but I don't know the total number of items and I don't want to build a huge array then pick an item out. For example, this is what I have right now:

List<string> items;
while (true)
{
 string item = GetNextItem();
 if (item == null)
 break;
}
int index = random.GetNext(0, items.count);

As you can see, I'm building a gigantic collection that I really don't need, I just need a random number between 0 and the number of items. Here is what I am thinking of doing, and it works, but I'd like to know if any of the experts out there can find a fault with it:

int index = -1;
int total;
string selectedItem;
while (true)
{
 string item = GetNextItem();
 if (item == null)
 break;
 ++total;
 int rnd = random.Next(0, total);
 if (rnd == total- 1)
 {
 index = total- 1;
 selectedItem = item;
 }
}

This gives me my index number, and the randomly selected item. My thinking behind this is that when there are 3 total items, for example, I pick a random number between 0 and 2 (inclusive) and if it's equal to 2 I use the new item as the selected item, if not just ignore it. As the total number of items increases, each new item's chance of being selected decreases accordingly.

Is this method "good"? Is it as "random" as building the array and picking an item out later? Is it as fast as it can be? Please guide me.

desertnaut
60.8k32 gold badges155 silver badges183 bronze badges
asked Mar 7, 2010 at 5:50
7
  • @Sky Sanders: late-night Saturday post... Commented Mar 7, 2010 at 5:57
  • LOL - late night post for sure. :) I have an infinite loop that keeps pulling items from a source until a null item is returned. I need to pick one of those items at random. Commented Mar 7, 2010 at 5:58
  • I think you've use 'count' when you mean 'total' in your second block of code Commented Mar 7, 2010 at 6:12
  • 2
    If stated clearly, that would make a pretty good interview question. I'll have to keep it in mind. Commented Mar 7, 2010 at 6:27
  • This is a classic Amazon.com interview question -- I think everybody who's interviewed there has been asked this at least once! Commented Mar 7, 2010 at 6:35

4 Answers 4

14

What you're doing will work.

Here's a restating of it that might make the algorithm slightly more clear:

  1. Select the first item, there is a 100% chance it will be the current selection
  2. If there is a second item, there is a 1/2 chance it will replace the current selection (If you do the math, then it's a 50% chance it will be the first item, and a 50% chance it will be the second item)
  3. If there is a third item, there is a 1/3 chance it will replace the current selection (again, the math the probability for each item being 1/3)
  4. If there is a fourth item, there is a 1/4 chance it will replace the current selection
  5. ... etc ...

Note that you should be able to compute a 1/x chance by saying rand.Next(0,x) == 0 (or any other integer between 0 and x - 1 inclusive; you don't have to bother using total - 1.

It's actually a pretty neat approach; initially I thought there wasn't going to be any good way of doing what you were asking!

answered Mar 7, 2010 at 6:06
Sign up to request clarification or add additional context in comments.

4 Comments

I am pretty bad at probability and statistics but this looks like the best option without knowing the upper bound ahead of time.
Bingo, that is what I was thinking in my rambling description. Can anyone out there find a fault with this, as far as picking random items goes?
@Jon, no fault to be found with this appoach: it's a very classic, even traditional algorithm, published e.g. in Knuth's "Art of Computer Programming" books -- see for example geomblog.blogspot.com/2008/01/happy-birthday-don-knuth.html .
This is indeed a well known algorithm. The generalised form, for picking several elements, is called Reservoir Sampling (en.wikipedia.org/wiki/Reservoir_sampling).
2

Your approach looks good, yes.

1 item = gets selected

2 items = 50% chance you pick the 2nd item to replace the 1st

3 items = 33% chance you pick the 3rd item, 67% chance you pick one of first two items

4 items = 25% chance you pick 4th item, 75% chance you pick ...

...

So contrary to most of the other responses here I think you have a working solution that gives an even probability distribution.

You could simplify the random check:

 int rnd = random.Next(0, total);
 if (rnd == 0)

As it doesn't matter which of the total-1 values you test for to get the 1/n probability.

answered Mar 7, 2010 at 6:19

Comments

0

we can prove it by induction.
it is true for 1;
if it is true for n; it is true for n+1;
=> prob. of selection for first n elements = 1/n
=> sice prob. of selecting (n+1)th element is 1/(n+1)
=> prob of not selecting (n+1)th element is n/(n+1)
=> prob of selection for first n elements after adding (n+1)th element = 1/n*(n/n+1)=1/n+1

answered Jul 10, 2010 at 16:22

Comments

-1

In your first code snippet, you use items.count, so you know how many elements you have. You need to know this number so that each element has an equal chance of being selected.

As you wrote, you generate a random number i such that 0 <= i < items.count, and then you try to quickly access element i of the list. (A linked list might not be a good choice of data structure.)

If you have a good estimate N of the number of items, you can use this instead of items.count.

In your second code snippet, you might have to initialize "total" to zero.

answered Mar 7, 2010 at 6:08

Comments

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.