Return to Answer

replaced http://codereview.stackexchange.com/ with https://codereview.stackexchange.com/

edited Apr 13, 2017 at 12:40

The big issue is that, as JS1 JS1 pointed out in comments, the loop which increments to avoid collisions needs to process the previous numbers in ascending order. There are two ways to do this: either keep a second, temporary, buffer with the numbers sorted; or sort randint as we go and then Fisher-Yates shuffle at the end. One uses twice as much memory; the other uses twice as many calls to get_random_int(). Both use insertion sort, and leave the overall running time as quadratic.

The big issue is that, as JS1 pointed out in comments, the loop which increments to avoid collisions needs to process the previous numbers in ascending order. There are two ways to do this: either keep a second, temporary, buffer with the numbers sorted; or sort randint as we go and then Fisher-Yates shuffle at the end. One uses twice as much memory; the other uses twice as many calls to get_random_int(). Both use insertion sort, and leave the overall running time as quadratic.

Correct error pointed out in comments

Source Link

edited Feb 10, 2017 at 14:58

Peter Taylor

edited Feb 10, 2017 at 14:58

Peter Taylor

24.4k
1
49
94

A simple improvement which requires only dstruct_size calls to get_random_int and overall quadratic time is the following (not testedpseudocode):

for (idx = 0; idx < dstruct_size; idx++) {
 tmp = (uint)get_random_int() % (max_range - idx);
 for (cmpIdx = 0; cmpIdx < idx; cmpIdx++) {
 // Discussed below
 if (tmp >= randint[cmpIdx]sorted(randint)[cmpIdx]) tmp++;
 }
 randint[idx] = tmp;
}

Obviously it needs minor amendment to handle the case max_range = 0, but it'sthat's not too difficult, especially if working with uint.

The Fisher-Yates option, however, does allow a fairly ~~(削除) tricky (削除ここまで)~~ elegant insertion.

for (i = 0; i < dstruct_size; i++) {
 tmp = (uint)get_random_int() % (max_range - i);
 for (j = 0; j < i; j++) {
 if (tmp >= randint[j]) tmp++;
 else {
 tmp2 = tmp;
 tmp = randint[j];
 randint[j] = tmp2;
 }
 }
 randint[i] = tmp;
}
// TODO Now Fisher-Yates shuffle

Observe that because randint contains sorted distinct values, once we find the insertion point for tmp we will never again hit the first case. In fact, that means we could optimise the inner loop and assignment to randint[i] as

 for (j = 0; j < i && tmp >= randint[j]; j++, tmp++) {
 // Empty loop body
 }
 for (; j <= i; j++) {
 tmp2 = tmp;
 tmp = randint[j];
 randint[j] = tmp2;
 }

A simple improvement which requires only dstruct_size calls to get_random_int and overall quadratic time is the following (not tested):

for (idx = 0; idx < dstruct_size; idx++) {
 tmp = (uint)get_random_int() % (max_range - idx);
 for (cmpIdx = 0; cmpIdx < idx; cmpIdx++) {
 if (tmp >= randint[cmpIdx]) tmp++;
 }
 randint[idx] = tmp;
}

Obviously it needs minor amendment to handle the case max_range = 0, but it's not too difficult, especially if working with uint.

A simple improvement which requires only dstruct_size calls to get_random_int and overall quadratic time is the following (pseudocode):

for (idx = 0; idx < dstruct_size; idx++) {
 tmp = (uint)get_random_int() % (max_range - idx);
 for (cmpIdx = 0; cmpIdx < idx; cmpIdx++) {
 // Discussed below
 if (tmp >= sorted(randint)[cmpIdx]) tmp++;
 }
 randint[idx] = tmp;
}

Obviously it needs minor amendment to handle the case max_range = 0, but that's not too difficult, especially if working with uint.

The Fisher-Yates option, however, does allow a fairly ~~(削除) tricky (削除ここまで)~~ elegant insertion.

for (i = 0; i < dstruct_size; i++) {
 tmp = (uint)get_random_int() % (max_range - i);
 for (j = 0; j < i; j++) {
 if (tmp >= randint[j]) tmp++;
 else {
 tmp2 = tmp;
 tmp = randint[j];
 randint[j] = tmp2;
 }
 }
 randint[i] = tmp;
}
// TODO Now Fisher-Yates shuffle

 for (j = 0; j < i && tmp >= randint[j]; j++, tmp++) {
 // Empty loop body
 }
 for (; j <= i; j++) {
 tmp2 = tmp;
 tmp = randint[j];
 randint[j] = tmp2;
 }

Source Link

answered Feb 10, 2017 at 8:52

Peter Taylor

answered Feb 10, 2017 at 8:52

Peter Taylor

24.4k
1
49
94

What are the valid input ranges? The documentation flags two special cases and the code handles them as special cases, but

I don't see any checks for dstruct_size < 0
If max_range is very large, the implementation of mod is incorrect. Consider e.g. mod(1 << 30, (1 << 30) + 1): the addition overflows into a negative value.

On the assumption that the intention is to guarantee that max_range will be zero or positive, why int instead of uint? Using uint would encode the guarantee of non-negative return values, and would remove the need for the mod helper.

 /* Check if 'tmp' is already in array */

 if (!_is_repeated_int(randint, idx, tmp)) {
 randint[idx] = tmp;
 idx++;
 }

is very inefficient. Consider the worst case: dstruct_size == max_range. The expected number of calls to get_random_int will be quadratic in max_range, and the overall running time of the method will be cubic.

A simple improvement which requires only dstruct_size calls to get_random_int and overall quadratic time is the following (not tested):

for (idx = 0; idx < dstruct_size; idx++) {
 tmp = (uint)get_random_int() % (max_range - idx);
 for (cmpIdx = 0; cmpIdx < idx; cmpIdx++) {
 if (tmp >= randint[cmpIdx]) tmp++;
 }
 randint[idx] = tmp;
}

Obviously it needs minor amendment to handle the case max_range = 0, but it's not too difficult, especially if working with uint.

I've followed your code in taking get_random_int() modulo a value, and that's ok if you don't need uniformity. However, in some cases the fact that get_random_int() % m is more likely to be 0 than m-1 (unless m is a power of two) is a problem. In that case you'd want an auxiliary get_uniform_random_int(m).

lang-c