The big issue is that, as JS1 JS1 pointed out in comments, the loop which increments to avoid collisions needs to process the previous numbers in ascending order. There are two ways to do this: either keep a second, temporary, buffer with the numbers sorted; or sort randint
as we go and then Fisher-Yates shuffle at the end. One uses twice as much memory; the other uses twice as many calls to get_random_int()
. Both use insertion sort, and leave the overall running time as quadratic.
The big issue is that, as JS1 pointed out in comments, the loop which increments to avoid collisions needs to process the previous numbers in ascending order. There are two ways to do this: either keep a second, temporary, buffer with the numbers sorted; or sort randint
as we go and then Fisher-Yates shuffle at the end. One uses twice as much memory; the other uses twice as many calls to get_random_int()
. Both use insertion sort, and leave the overall running time as quadratic.
The big issue is that, as JS1 pointed out in comments, the loop which increments to avoid collisions needs to process the previous numbers in ascending order. There are two ways to do this: either keep a second, temporary, buffer with the numbers sorted; or sort randint
as we go and then Fisher-Yates shuffle at the end. One uses twice as much memory; the other uses twice as many calls to get_random_int()
. Both use insertion sort, and leave the overall running time as quadratic.
A simple improvement which requires only dstruct_size
calls to get_random_int
and overall quadratic time is the following (not testedpseudocode):
for (idx = 0; idx < dstruct_size; idx++) {
tmp = (uint)get_random_int() % (max_range - idx);
for (cmpIdx = 0; cmpIdx < idx; cmpIdx++) {
// Discussed below
if (tmp >= randint[cmpIdx]sorted(randint)[cmpIdx]) tmp++;
}
randint[idx] = tmp;
}
Obviously it needs minor amendment to handle the case max_range = 0
, but it'sthat's not too difficult, especially if working with uint
.
The big issue is that, as JS1 pointed out in comments, the loop which increments to avoid collisions needs to process the previous numbers in ascending order. There are two ways to do this: either keep a second, temporary, buffer with the numbers sorted; or sort randint
as we go and then Fisher-Yates shuffle at the end. One uses twice as much memory; the other uses twice as many calls to get_random_int()
. Both use insertion sort, and leave the overall running time as quadratic.
The Fisher-Yates option, however, does allow a fairly (削除) tricky (削除ここまで) elegant insertion.
for (i = 0; i < dstruct_size; i++) {
tmp = (uint)get_random_int() % (max_range - i);
for (j = 0; j < i; j++) {
if (tmp >= randint[j]) tmp++;
else {
tmp2 = tmp;
tmp = randint[j];
randint[j] = tmp2;
}
}
randint[i] = tmp;
}
// TODO Now Fisher-Yates shuffle
Observe that because randint
contains sorted distinct values, once we find the insertion point for tmp
we will never again hit the first case. In fact, that means we could optimise the inner loop and assignment to randint[i]
as
for (j = 0; j < i && tmp >= randint[j]; j++, tmp++) {
// Empty loop body
}
for (; j <= i; j++) {
tmp2 = tmp;
tmp = randint[j];
randint[j] = tmp2;
}
A simple improvement which requires only dstruct_size
calls to get_random_int
and overall quadratic time is the following (not tested):
for (idx = 0; idx < dstruct_size; idx++) {
tmp = (uint)get_random_int() % (max_range - idx);
for (cmpIdx = 0; cmpIdx < idx; cmpIdx++) {
if (tmp >= randint[cmpIdx]) tmp++;
}
randint[idx] = tmp;
}
Obviously it needs minor amendment to handle the case max_range = 0
, but it's not too difficult, especially if working with uint
.
A simple improvement which requires only dstruct_size
calls to get_random_int
and overall quadratic time is the following (pseudocode):
for (idx = 0; idx < dstruct_size; idx++) {
tmp = (uint)get_random_int() % (max_range - idx);
for (cmpIdx = 0; cmpIdx < idx; cmpIdx++) {
// Discussed below
if (tmp >= sorted(randint)[cmpIdx]) tmp++;
}
randint[idx] = tmp;
}
Obviously it needs minor amendment to handle the case max_range = 0
, but that's not too difficult, especially if working with uint
.
The big issue is that, as JS1 pointed out in comments, the loop which increments to avoid collisions needs to process the previous numbers in ascending order. There are two ways to do this: either keep a second, temporary, buffer with the numbers sorted; or sort randint
as we go and then Fisher-Yates shuffle at the end. One uses twice as much memory; the other uses twice as many calls to get_random_int()
. Both use insertion sort, and leave the overall running time as quadratic.
The Fisher-Yates option, however, does allow a fairly (削除) tricky (削除ここまで) elegant insertion.
for (i = 0; i < dstruct_size; i++) {
tmp = (uint)get_random_int() % (max_range - i);
for (j = 0; j < i; j++) {
if (tmp >= randint[j]) tmp++;
else {
tmp2 = tmp;
tmp = randint[j];
randint[j] = tmp2;
}
}
randint[i] = tmp;
}
// TODO Now Fisher-Yates shuffle
Observe that because randint
contains sorted distinct values, once we find the insertion point for tmp
we will never again hit the first case. In fact, that means we could optimise the inner loop and assignment to randint[i]
as
for (j = 0; j < i && tmp >= randint[j]; j++, tmp++) {
// Empty loop body
}
for (; j <= i; j++) {
tmp2 = tmp;
tmp = randint[j];
randint[j] = tmp2;
}
What are the valid input ranges? The documentation flags two special cases and the code handles them as special cases, but
- I don't see any checks for
dstruct_size < 0
- If
max_range
is very large, the implementation ofmod
is incorrect. Consider e.g.mod(1 << 30, (1 << 30) + 1)
: the addition overflows into a negative value.
On the assumption that the intention is to guarantee that max_range
will be zero or positive, why int
instead of uint
? Using uint
would encode the guarantee of non-negative return values, and would remove the need for the mod
helper.
/* Check if 'tmp' is already in array */
if (!_is_repeated_int(randint, idx, tmp)) {
randint[idx] = tmp;
idx++;
}
is very inefficient. Consider the worst case: dstruct_size == max_range
. The expected number of calls to get_random_int
will be quadratic in max_range
, and the overall running time of the method will be cubic.
A simple improvement which requires only dstruct_size
calls to get_random_int
and overall quadratic time is the following (not tested):
for (idx = 0; idx < dstruct_size; idx++) {
tmp = (uint)get_random_int() % (max_range - idx);
for (cmpIdx = 0; cmpIdx < idx; cmpIdx++) {
if (tmp >= randint[cmpIdx]) tmp++;
}
randint[idx] = tmp;
}
Obviously it needs minor amendment to handle the case max_range = 0
, but it's not too difficult, especially if working with uint
.
I've followed your code in taking get_random_int()
modulo a value, and that's ok if you don't need uniformity. However, in some cases the fact that get_random_int() % m
is more likely to be 0
than m-1
(unless m
is a power of two) is a problem. In that case you'd want an auxiliary get_uniform_random_int(m)
.