Many people seed their Mersenne Twister engines like this:
std::mt19937 rng(std::random_device{}());
However, this only provides a single unsigned int
, i.e. 32 bits on most systems, of seed randomness, which seems quite tiny when compared to the 19937 bit state space we want to seed. Indeed, if I find out the first number generated, my PC (Intel i7-4790K) only needs about 10 minutes to search through all 32 bit numbers and find the used seed. (I know that MT is not a cryptographic RNG, but I just did that to get a feel for how small 32 bit really is in these days.)
I am trying to build a function to properly seed a mt19937
and came up with this:
#include <algorithm>
#include <iostream>
#include <random>
auto RandomlySeededMersenneTwister () {
// Magic number 624: The number of unsigned ints the MT uses as state
std::vector<unsigned int> random_data(624);
std::random_device source;
std::generate(begin(random_data), end(random_data), [&](){return source();});
std::seed_seq seeds(begin(random_data), end(random_data));
std::mt19937 seededEngine (seeds);
return seededEngine;
}
int main() {
auto rng = RandomlySeededMersenneTwister();
for (int i = 0; i < 10; ++i)
std::cout << rng() << "\n";
}
This does look like a safe solution to me; however, I have learned that problems with RNG are often times quite subtle.
Provided std::random_device
produces good, random data on my system, does the code give me a correctly seeded std::mt19937
?
4 Answers 4
Well, first off, why do you use a
std::vector
for a comparatively small sequence of known length? A raw array orstd::array
suffice and avoids any dynamic allocation.Next, avoid needless magic numbers. Use
std::mt19937::state_size
instead of manually specifying624
.Why do you use a lambda? A simple
std::ref(source)
suffices.
The seeding itself looks perfectly fine and there's no actual error anywhere in your code.
template<class T = std::mt19937, std::size_t N = T::state_size * sizeof(typename T::result_type)>
auto ProperlySeededRandomEngine () -> typename std::enable_if<N, T>::type {
std::random_device source;
std::random_device::result_type random_data[(N - 1) / sizeof(source()) + 1];
std::generate(std::begin(random_data), std::end(random_data), std::ref(source));
std::seed_seq seeds(std::begin(random_data), std::end(random_data));
return T(seeds);
}
You could avoid the need for random_data
by using counting and transforming iterators as detailed in "Sequence iterator? Isn't there one in boost? ".
This is not simpler, but maybe more efficient:
template<class T = std::mt19937, std::size_t N = T::state_size * sizeof(typename T::result_type)>
T ProperlySeededRandomEngine () {
std::random_device source;
auto make_iter = [&](std::size_t n) {
return boost::make_transform_iterator(
boost::counting_iterator<std::size_t>(n), [&](size_t){return source();});
};
std::seed_seq seeds(make_iter(0), make_iter((N - 1) / sizeof(source()) + 1));
return T(seeds);
}
If you can upgrade to C++20, use ranges and views (godbolt):
template<class T = std::mt19937, std::size_t N = T::state_size * sizeof(typename T::result_type)>
T ProperlySeededRandomEngine () {
std::random_device source;
auto random_data = std::views::iota(std::size_t(), (N - 1) / sizeof(source()) + 1)
| std::views::transform([&](auto){ return source(); });
std::seed_seq seeds(std::begin(random_data), std::end(random_data));
return T(seeds);
}
I wrote more or less exactly the same function for my own use so of course I think it is pretty awesome. ;-)
Two things that I would do differently (only style, not security):
Don't hard-code the magic number 624. The
std::mersenne_twister_engine
template
class
has astatic
constexpr
memberword_size
that you can use instead. Likewise, instead ofunsigned
, prefer usingresult_type
.Consider making the function a
template
so it can be used forstd::mt19937_64
(and maybe other compatible engines) as well.
-
\$\begingroup\$ Does using monospace for
template
,class
,static
, andconstexpr
improve the clarity of what you're saying? If so, you missedfor
,do
, andusing
. \$\endgroup\$bcrist– bcrist2015年10月31日 06:47:26 +00:00Commented Oct 31, 2015 at 6:47 -
3\$\begingroup\$ @bcrist No, because in the places I used these words, I am using them with their natural language meaning, not to refer to a C++ keyword. If I were talking about a
for
loop or ausing
directive, I would have made that clear by using the appropriate font. Likewise, I would writeint
if I refer to the C++ data type and say "integer" if I refer to the mathematical concept of numbers. \$\endgroup\$5gon12eder– 5gon12eder2015年10月31日 10:43:01 +00:00Commented Oct 31, 2015 at 10:43 -
5\$\begingroup\$ What I was trying to humorously point out is that template class, static, and constexpr are not just references to C++ keywords, they are natural language concepts in the vocabulary of any competent C++ dev. It is the concept that's important, not the token that represents it. Furthermore, in the case of describing
mersenne_twister_engine
, usingclass
instead of class is actually incorrect. The standard does not mandate that it (or any otherstd
type) use theclass
keyword. An implementation could just as easily usestruct
if it so desires. \$\endgroup\$bcrist– bcrist2015年10月31日 16:25:16 +00:00Commented Oct 31, 2015 at 16:25 -
2\$\begingroup\$ @bcrist: Kudos for a way to be more precise by getting more colloquial. \$\endgroup\$Deduplicator– Deduplicator2015年11月01日 18:16:00 +00:00Commented Nov 1, 2015 at 18:16
I can offer a different possibility of correctly initializing a mt19937 random number generator:
auto RandomlySeededMersenneTwister () {
std::mt19937 rng(std::random_device{}());
rng.discard(700000);
return rng;
}
According to this paper ("Improved long-period generators based on linear recurrences modulo 2", F. Panneton, P. L'Ecuyer, M. Matsumoto in AVM TOMS Volume 32 Issue 1, March 2006 Pages 1-16), especially figure 4 in chapter 7, one can obtain a mersenne twister state of high quality even when its initialization variables have only a few bits set to a non-zero value. You need to perform about 700000 random number generations (or twists) during/after the initialization phase.
This certainly extends to your case and is much easier to implement. It is, of course, much slower, but random number generator initialization should always only be triggered once in every binary, so this is probably negligible in many cases.
Also, this solution makes it much easier to reproduce results, as it is not needed to save 624 numbers but only one.
Another advantage is the fact, that you can be sure that you get a nice equal distribution. The original solution does not guarantee this property and relies instead on the quality of the source random number generator and its interaction with mt19937.
-
\$\begingroup\$ Thanks for writing that review. This was my first thought as well and could be trivially implemented using
std::discard_block_engine::discard()
. \$\endgroup\$Edward– Edward2015年11月02日 12:46:30 +00:00Commented Nov 2, 2015 at 12:46 -
\$\begingroup\$ Fortunately, you understood what I meant (
std::mersenne_twister::discard()
) rather than what I wrote! That takes less than 5 milliseconds on my 64-bit Linux box running a dual-core processor at 2.3GHz. \$\endgroup\$Edward– Edward2015年11月02日 15:49:30 +00:00Commented Nov 2, 2015 at 15:49 -
7\$\begingroup\$ Burn-in is good, but it does not change the fact that you seed a 624 byte state with just a single integer. \$\endgroup\$Johan Lundberg– Johan Lundberg2016年04月20日 19:46:54 +00:00Commented Apr 20, 2016 at 19:46
-
6\$\begingroup\$ It does matter. After you seed with a 32 bit int and skip 700k iterations, there is still only 2^32 possible internal states. The paper you are referring to is talking about recovering from a bad internal state, this is not the problem at hand - if you pass a single number to the Mt constructor it uses a seed sequence to fill all its internal state, so you don't start with an almost all zeroes state. But this process (just like discarding) is deterministic, which is the problem the original question is talking about, and discarding does not fix it. \$\endgroup\$etarion– etarion2016年06月30日 13:01:22 +00:00Commented Jun 30, 2016 at 13:01
-
1\$\begingroup\$ @etarion His remark " (I know that MT is not a cryptographic RNG, I just did that to get a feel for how small 32 bit really is in these days.)" also shows that he is not really keen on the safety aspect of his random numbers. And if you are not looking for a safe RNG my solution works just fine (and you can't use a MT anyways...). \$\endgroup\$Nils_M– Nils_M2016年07月01日 16:05:35 +00:00Commented Jul 1, 2016 at 16:05
Yes, it does technically provide you with a 'correctly' seeded std::mt19937
PRNG. However, the approach is comparatively clumsy - though you're only offering it as example code, to be fair. You can do without explicit magic numbers like (624)
- for the required 19937
bits.
header:
#include <random>
class X
{
...
mutable std::mt19937 rng; // mutable is a design decision.
...
// thanks to Konrad Rudolph for correcting me on potentially misleading
// typedefs, that may be misleading with other RNGs.
};
implementation:
#include <algorithm>
#include <functional>
X::X ()
{
std::random_device rdev;
std::seed_seq::result_type data[std::mt19937::state_size];
std::generate_n(data, std::mt19937::state_size, std::ref(rdev));
std::seed_seq prng_seed (data, data + std::mt19937::state_size);
rng.seed(prng_seed);
}
The member X::rng
has a full random 19937-bit state ready for use. Of course this will depend on the policy of std::random_device
. Not only do you have an essentially inexhaustible period of (2^19937 - 1)
when using rng
itself - you also have a random state vector of 19937
bits. The likelihood of re-creating the same initial PRNG state is now ~ (2^-19937), as opposed to (2^-32). Furthermore, all mt19937 states are theoretically reachable with this code.
-
1\$\begingroup\$ I think your code assumes that
std::seed_seq::result_type
==random_engine::result_type
. This isn’t generally true (e.g. forstd::mt19937_64
). In this code your seed sequence will be generated using 64unsigned int
s (=std::random_device::value_type
), which isn’t sufficient entropy. Am I wrong? \$\endgroup\$Konrad Rudolph– Konrad Rudolph2020年03月17日 10:28:31 +00:00Commented Mar 17, 2020 at 10:28 -
1\$\begingroup\$ My point is that your code is intentionally keeping the random engine generic (though short of making it a template parameter) but your seeding code is specific to mt19937. If the specificity is intentional (which is fine), it’s dangerous to hide the connection to a specific RNG behind a generically named typedef. \$\endgroup\$Konrad Rudolph– Konrad Rudolph2020年03月20日 08:37:03 +00:00Commented Mar 20, 2020 at 8:37
-
\$\begingroup\$ That said, I should make it clear that I’m not actually taking issue with your code, my comment was really just a question to confirm my understanding that a 64-bit MT19937 would need to be seeded differently (i.e. with more random bits). \$\endgroup\$Konrad Rudolph– Konrad Rudolph2020年03月20日 08:42:37 +00:00Commented Mar 20, 2020 at 8:42
-
\$\begingroup\$ @KonradRudolph - Now I understand your point. You're right. My code looks 'generic', but it is not; and it could be a source of confusion. It might be an 'example', but I don't want anyone using it for 'cargo cult' coding. \$\endgroup\$Brett Hale– Brett Hale2020年03月20日 09:32:00 +00:00Commented Mar 20, 2020 at 9:32
std::random_device
does not work correctly on every platform (I'm looking at you MinGW). If you're going the Boost way, then you could useboost::random_device
instead. \$\endgroup\$std::mt19937 eng{boost::random_device{}}
(uses astd::seed_seq
as the device is fed) unlikestd::mt19937 eng{std::random_device{}()}
(mind the parenthesis; uses anunsigned
as just an unsigned integer is fed). Refer pcg-random.org's article. \$\endgroup\$