1
$\begingroup$

I have a set of objects, and each object has (let's say) a color. Given a distribution such as:

red : 50%
green : 30%
blue : 20%

I need to produce an N-length array of objects with a color distribution as close to this as possible. It is possible that the length of the original list is smaller than N, so duplicates may be required.

Is there an efficient algorithm that might be suited for this kind of task? I imagine there is, but I'm lacking the terminology to find it.

喜欢算法和数学
39.2k4 gold badges35 silver badges95 bronze badges
asked Aug 5, 2019 at 14:04
$\endgroup$

2 Answers 2

1
$\begingroup$

The term you are looking for is "sampling", "sampling without replacement", "sampling with replacement", "generating random samples from a given distribution", or "inverse transform sampling". Lots of nice articles and references can be found if you use those keywords to search.

Here is a simple algorithm to simulate the given distribution, red:50% green:30% blue:20%, using sampling with replacement.

Let arr be an empty array. 
Repeat the following N times
 generate a random float-point number between 0 inclusive and 1 exclusive.
 if the number is between 0 inclusive and 0.50 exclusive, select a random red object.
 if the number is between 0.50 inclusive and 0.80 exclusive, select a random green object.
 if the number is between 0.80 inclusive and 1.0 exclusive, select a random blue object.
 Append the selected object to arr.
Return arr. 
answered Aug 5, 2019 at 14:50
$\endgroup$
1
  • $\begingroup$ Thanks for the terminology, just what I needed. $\endgroup$ Commented Aug 5, 2019 at 15:06
2
$\begingroup$

Do you need to guarantee that, in every sample, 50% of the objects are red, 30% are green and 20% are blue? If so, just make your array consist of 0ドル.5,円N$ red objects chosen uniformly at random, followed by 0ドル.3,円N$ green and 0ドル.2,円N$ blue. You'll need to deal with the rounding error if $N$ isn't divisible by 10ドル$. You can always shuffle the array if it's not desirable to have it sorted by colour.

If you only need to guarantee that the proportions of the colours are correct in the long-run, Apass.Jack's solution will do the job. If $N$ is very large, then each array generation is "the long-run" and will be close to the correct proportions (but not guaranteed to be exact).

answered Aug 5, 2019 at 15:16
$\endgroup$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.