I tried to replace LuaJIT 2.0.5 lib_math.c with xoshiro256**, but
did not see any math.random speedup (against Tausworthe)
By swapping s[0] and s[1], and scramble with updated state, it
matched LuaJIT Tausworthe speed. (Note: it is still xoshiro256**)
LuaJIT, 1 billion math.random(), best of 3
49.2s Tausworthe: LuaJIT stock PRNG
50.1s. Xoshiro256**: http://xoshiro.di.unimi.it/xoshiro256starstar.c
48.9s. Xoshiro256** (modified), lj_math_random_step() patch below
static inline uint64_t rotl(const uint64_t x, int k) {
return (x << k) | (x >> (64 - k));
}
LJ_NOINLINE uint64_t LJ_FASTCALL lj_math_random_step(RandomState *rs)
{
uint64_t *s = rs->gen; /* modifed xoshiro256** */
uint64_t t = s[0] << 17; /* s[0], s[1] swapped */
s[2] ^= s[1];
s[3] ^= s[0];
s[0] ^= s[2];
s[1] ^= s[3];
s[2] ^= t;
s[3] = rotl(s[3], 45);
uint64_t r = rotl(s[0] * 5, 7) * 9; /* scramble updated state */
return (r & U64x(000fffff,ffffffff)) | U64x(3ff00000,00000000);
}
Vigna recommendation for xoshiro256** random double:
> There's no detectable difference between the bits. Theoretically, however,
> the upper bits have a slightly higher linear complexity, so if you don't have
> any other criterion I'd say to use the high bits.
>
> -- Vigna 5/9/2018
So, maybe better patch is to use the high bits:
< return (r & U64x(000fffff,ffffffff)) | U64x(3ff00000,00000000);
> return (r >> 12) | U64x(3ff00000,00000000);