Simple dataset class in C++

Question 1

As part of my training, I implemented a dataset class in C++ that allows me to create synthetic datasets to get more familiar with features that C++ offers such as object oriented programming.

This implementation uses a simple method to create two intertwined spirals as shown in the following image.

enter image description here

The output looks as follows:

x y label
-3.85616 -0.987058 0
1.16 4.6764 0
2.66279 2.13692 1
-0.643551 3.81989 0
-0.252046 -0.888411 0

Please be as hard as possible with this implementation and give me constructive feedback.

I would appreciate advice especially in the following areas:

Code style (readability, naming conventions, etc...)
Class design
Efficiency (how to avoid unnecessary complexity)
Reinventing the wheel (does the STL offer functionality that I should use?)

main.cpp

#include <iostream>
#include <vector>
#include "data.hpp"
int main() {
 unsigned int n_points = 3000;
 Data data (n_points);
 std::vector<Point> d;
 d = data.get();
 for (std::vector<Point>::iterator it = d.begin(); it != d.end(); ++it) {
 std::cout << it->x << " " << it->y << " " << it->label << std::endl;
 }
 return 0;
}

data.hpp

#ifndef DATA_H
#define DATA_H
#include <vector>
#include <cmath>
#include "util.hpp"
class Data {
 public:
 Data (unsigned int);
 unsigned int n_samples;
 unsigned int n_dims = 2;
 unsigned int n_classes = 2;
 std::vector<Point> get();
 private:
 Random random;
 std::vector<Point> data;
 std::vector<Point> generate_data();
};
Data::Data (unsigned int _n_samples) {
 n_samples = _n_samples;
}
std::vector<Point> Data::generate_data() {
 std::vector<Point> data(n_samples);
 std::vector<Point>::iterator it;
 double noise = 1.0;
 double spread = 4.0;
 for (it = data.begin(); it != data.end(); ++it) {
 double r = 3.14 * spread * random.rand();
 double r1 = -cos(r) * r + noise * random.rand();
 double r2 = sin(r) * r + noise * random.rand();
 if (random.rand() < 0.5) {
 it->x = r1;
 it->y = r2;
 it->label = 1;
 } else {
 it->x = -r1;
 it->y = -r2;
 it->label = 0;
 } 
 }
 return data;
}
std::vector<Point> Data::get() {
 std::vector<Point> data(n_samples);
 std::vector<Point>::iterator it;
 data = generate_data();
 return data;
}
#endif /* DATA_H */

util.hpp

#ifndef UTIL_H
#define UTIL_H
#include <cstdlib>
#include <ctime>
struct Point {
 double x;
 double y;
 unsigned int label;
};
class Random {
 public:
 Random ();
 double rand();
};
Random::Random () {
 std::srand(static_cast<unsigned int>(std::time(nullptr)));
}
double Random::rand() {
 return static_cast<double> (std::rand()) / static_cast<double> (RAND_MAX + 1U);
}
#endif /* UTIL_H */

Question 2

Here are some things that may help you improve your program.

Separate interface from implementation

The interface goes into a header file and the implementation (that is, everything that actually emits bytes including all functions and data) should be in a separate .cpp file. The reason is that you might have multiple source files including the .h file but only one instance of the corresponding .cpp file. In other words, split your existing data.hpp and util.hpp files into a .h file and a .cpp file.

Understand the use of namespaces

Your code is correctly using <cmath> instead of <math.h>. The difference between the two forms is that the former defines things within the std:: namespace versus into the global namespace. So this implies that instead of using cos and sin, the code should be using std::cos and std::sin

Use `const` where practical

There are a number of named constants in the program such as unsigned int n_dims = 2;. It's very good practice to name these, but even better is to declare them const or better still constexpr. The reason is that it allows the compiler to perform better optimizations without sacrificing readability.

Don't use `std::endl` if you don't really need it

The difference betweeen std::endl and '\n' is that '\n' just emits a newline character, while std::endl actually flushes the stream. This can be time-consuming in a program with a lot of I/O and is rarely actually needed. It's best to only use std::endl when you have some good reason to flush the stream and it's not very often needed for simple programs such as this one. Avoiding the habit of using std::endl when '\n' will do will pay dividends in the future as you write more complex programs with more I/O and where performance needs to be maximized.

Use "range `for`" and simplify your code

The code currently includes these lines:

for (std::vector<Point>::iterator it = d.begin(); it != d.end(); ++it) {
 std::cout << it->x << " " << it->y << " " << it->label << std::endl;
}

Since C++11, however, we have a range for which simplifies it to this:

for (const auto& point : d) {
 std::cout << point.x << " " << point.y << " " << point.label << '\n';
}

Write a stream inserter for your class

We can further refine the for loop above by creating a custom inserter for the Point class:

std::ostream& operator<<(std::ostream& out, const Point& point) {
 return out << point.x << " " << point.y << " " << point.label;
}

Now our loop in main is even simpler:

for (const auto& point : d) {
 std::cout << point << '\n';
}

Use standard library algorithms

We can use std::copy to copy from the vector to std::cout, replacing the loop with a single statement:

std::copy(d.begin(), d.end(), std::ostream_iterator<Point>(std::cout, "\n"));

Consider using a better random number generator

If you are using a compiler that supports at least C++11, consider using a better random number generator. In particular, instead of rand, you might want to look at std::uniform_real_distribution and friends in the <random> header. Also, instead of doing this:

if (random.rand() < 0.5) {
 it->x = r1;
 it->y = r2;
 it->label = 1;
} else {
 it->x = -r1;
 it->y = -r2;
 it->label = 0;
}

I would suggest using a std::bernoulli_distribution.

Rethink the classes

Rather than having the vaguely named Data class creating Points, I'd suggest moving that functionality into the Point class. Since this is actually creating a Point, it makes sense to me to create a custom constructor with this functionality:

Point::Point() {
 static std::random_device rd;
 static std::mt19937 gen(rd());
 static std::uniform_real_distribution<> spread(0.0, 4.0*3.14);
 static std::uniform_real_distribution<> noise(0.0, 1.0);
 static std::bernoulli_distribution type(0.5);
 static constexpr double multiplier[]{-1.0, +1.0};
 label = type(gen);
 double r = spread(gen);
 x = multiplier[label] * (-std::cos(r) * r + noise(gen));
 y = multiplier[label] * (+std::sin(r) * r + noise(gen));
}

Minimize memory use

It's not critical for this program, but rather than storing everything in a vector only to print it and then destroy the vector, a more memory efficient approach would be to generate each point as needed and emit it directly. Using the Point constructor and stream inserter as defined above, this becomes trivial.

Result

Here's the entire program as rewritten with all of these ideas:

#include <random>
#include <iostream>
#include <iterator>
#include <cmath>
struct Point {
 Point();
 double x;
 double y;
 unsigned int label;
};
Point::Point() {
 static std::random_device rd;
 static std::mt19937 gen(rd());
 static std::uniform_real_distribution<> spread(0.0, 4.0*3.14);
 static std::uniform_real_distribution<> noise(0.0, 1.0);
 static std::bernoulli_distribution type(0.5);
 static constexpr double multiplier[]{-1.0, +1.0};
 label = type(gen);
 double r = spread(gen);
 x = multiplier[label] * (-std::cos(r) * r + noise(gen));
 y = multiplier[label] * (+std::sin(r) * r + noise(gen));
}
std::ostream& operator<<(std::ostream& out, const Point& point) {
 return out << point.x << " " << point.y << " " << point.label;
}
int main() {
 for (auto i{3000}; i; --i) {
 std::cout << Point() << '\n';
 }
}

Question 3

Rather than for (auto i{3000}; i; --i) ..., you could end with std::generate_n(std::ostream_iterator<Point>(std::cout, "\n"), 3000, []{ return Point(); })'

Question 4

@Caleth: True, and I considered it but opted for the simpler version shown.

Question 5

You are seeding the pseudorandom number generator with a single unsigned int from the random device. The state size of std::mt19937 is 19937 bits, so it is unlikely (although technically permitted by the standard) that a single unsigned int will be enough: it will work but will only allow access to a tiny subset of the possible initial states. It would be better to use a std::seed_seq instead.

Question 6

@antispinwards: you're right that the current seed is not very good. However, I'm not sure how to do better, even with std::seed_seq. See this for example. Ideas on how to better address this would be most welcome!

Question 7

@Edward - yes, std::seed_seq is less than ideal (barring the C++ folks having a fit of sanity and bugfixing it in a way that doing the correct thing with sufficient data is given priority over doing something with insufficient data), but that doesn't mean we should just throw up our hands and seed the generator in a way that can provide at most 2^32 possibilities (for common definitions of unsigned int) out of 2^19937. 2^32 isn't a particularly large number these days for this kind of thing.

Question 8

This is a nice, readable piece of code. I would just highlight a few things you might want to consider.

Unnecessary Class

Let's take a close look at Random class. It holds no state, just a member function rand. In addition to having just one member function, it is only used in one place, in member function generate_data. Classes are meant to be reusable and Random does not serve that purpose. A common solution is to create a free function that randomly generate numbers and returns the output. Prefer free-functions to member-functions.

Readability Review

It is nice to spell out the iterator name fully but most times, auto keyword looks more readable. This means this code

for (std::vector<Point>::iterator it = d.begin(); it != d.end(); ++it) {
 std::cout << it->x << " " << it->y << " " << it->label << std::endl;
}

can be rewritten as

for(const auto& point : points)
 std::cout << point << '\n'

NOTE: You have to define std::cout that takes your class as it's second argument.

Use good namings

There is much going on in generate_data and can get someone easily confused. avoid one letter variable as much as possible. Anyone reading your code without proper context on what r and x means would easily get confused. Your names should carry intent and they should explain what's going at first glance.

Nitpicks

This piece of code looks ugly

std::vector<Point> Data::generate_data() {
 std::vector<Point> data(n_samples); 
 ....
}

which data are we really assigning to member-variable data or local-variable data? In this case, we are creating a local-variable data. Avoid shadowing your member-variables.

Prefer member-initialization to assignment. This piece of code

Data::Data (unsigned int _n_samples) { // initialized in member initialization list
 n_samples = _n_samples;
}

creates and default initializes n_samples and also make the assignment. This could have been simply initialized at creation time. A proper way of doing that is as follow

Data::Data (unsigned int _n_samples) :
 n_samples{_n_samples}

Your naming convention are inconsistent, sometimes you prefix with _ other times you don't. Choose a style and stick with it.

Question 9

Perspective

The answers that were already provided are great for solving the particular problem you have. I will provide feedback based on my current workflow. I implement a tool for those who would like to make quick C++ prototypes without much regard to engineering, which I then fix and make more of a complete solution.

Underlying problem

What you want to do is a language to describe relation of random variables. The rest is about loops and choosing distributions, sources of random variables, etc. Embedded languages in C++ are written either using macros or expression templates. Since we don't need any source level modifications, lets stick with the latter.

The moving parts

There are already many libraries for random number generation. Some do not like standard C++ facilities for that, but might have desirable properties not provided by standard facilities (for example curand has GPU support, which is very useful in case you need to generate a lot of data). We need a way to latch onto existing libraries. The most feasible way in my opinion is to provide CRTP classes that will add the layer we need:

Providing distribution support
Strict typing
Container filling (std::ranges, iterators)

Rough sketch of it would look like this:

template <typename RandomSource, typename Distribution>
class independent_distributed_variable {
 Distribution distribution;
public:
 template <typename ... Args>
 random_variable(Args&& ... args):
 distribution(std::forward<Args>(args)...)
 {}
 auto operator()() {
 auto& source = static_cast<RandomSource&>(*this);
 return distribution(source);
 }
 template <typename OutputIterator>
 void operator()(OutputIterator first, OutputIterator) {
 // use `std::generate` and delegate to singular version
 // or use batch filling 
 }
};

One might still have to write some wrapper classes (in case interfaces are not compatible). This is not the most important problem though.

The random variable description language

This is the most "C++" part of the problem. Lets nail down the usage first. The random variables in your case:

\$R \sim N(0.5, 0.15)\$

\$label\ \sim B(1, 0.5)\$

\$X \sim (-\cos (4 * \pi * R) + R) * (-1)^{label}\$

\$Y \sim (\sin (4 * \pi * R) + R) * (-1)^{label}\$

(Sorry if I butchered the terminology and notation, just wanted to write it down in a more readable way)

The closest approximation in C++ would be (given the library namespace shino):

auto r = shino::normal_distribution(0.5, 0.15);
auto label = shino::bernoulli_distribution(0.5);
auto x = (-shino::cos(4 * shino::pi * r) + r) * shino::power(-1, label);
auto y = (shino::sin(4 * shino::pi * r) + r) * shino::power(-1, label);

Then people could just fill their own struct with a loop or std::generate if they want. I will give motivations later on leaving this part open ended.

Implementation guidelines

The naive implementation is pretty easy. Lets consider sine expression.

template <typename AlphaVariable>
class sine_expression {
 AlphaVariable alpha;
public:
 sine_expression(AlphaVariable alpha):
 alpha(alpha)
 {}
 auto operator()() {
 return std::sin(alpha()); // evaluate the dependency first
 }
};
template <typename AlphaVariable>
sine_expression<AlphaVariable> sin(AlphaVariable alpha) {
 return {alpha};
}

There is one problem that springs out: the implementer will need to differentiate between temporary expressions and expressions saved into a variable (sin(x + y) vs just sin(x), the former will have temporary plus_expression, the latter will have lvalue of whatever type is x). The problem is definitely solvable, albeit requiring more code.

Importance of user provided data structures

In computer vision field, most trained models I encountered want tightly packed image pixels (e.g. 3 channel image of float channels will have pixels at addresses multiple of 12, not 16). By default compiler will pad the struct of 3 floats to 16 bytes, which will break the inference. As such there is sometimes a need to fill weird structs and write at unusual memory offsets.

Question 10

There is also an edge case: r is "rolled" each time per usage, while label is not. I will try to fix that tomorrow and amend my answer.

Question 11

I like this approach. I had also wondered about using C++20 range adapters.

Question 12

what's "shino"?

Question 13

@NooneAtAll, it is the character in my avatar. I usually either call my library namespaces with strongly related words or with the nickname I like.

Question 14

Design:

Data is not a descriptive name, and also misrepresents what the class does. It's really a generator function that's triggered by calling get().

As such, it would be better to replace it with a function, called something like generate_spirals().

We should use the C++ standard library random header for random number generation. This gives us various random number generators, e.g. std::mt19937, that we could use instead of the custom Random class.

Lastly, I'd suggest doing the labelling and inversion as separate stages. So we instead generate two spirals using the same code, and invert one of them. That way we don't need the label in the point class, we can just store two separate vectors of (x, y) coordinates and add the label when printing them.

So something like:

#include <random>
#include <vector>
#include <cmath>
struct Point {
 double x;
 double y;
};
std::vector<Point> generate_spiral(unsigned int samples, double noise, double spread, std::mt19937& rng) {
 
 auto points = std::vector<Point>();
 points.reserve(samples);
 
 auto dist = std::uniform_real_distribution<double>(0.0, 1.0);
 
 for (auto n = 0u; n != samples; ++n) {
 double r = 3.14 * spread * dist(rng);
 double x = -std::cos(r) * r + noise * dist(rng);
 double y = std::sin(r) * r + noise * dist(rng);
 points.push_back({ x, y });
 }
 
 return points;
}
std::vector<Point> invert_points(std::vector<Point> points) {
 for (auto& p : points) {
 p.x = -p.x;
 p.y = -p.y;
 }
 return points;
}
#include <iostream>
int main() {
 
 auto rng = std::mt19937(std::random_device()());
 
 auto samples = 200u;
 auto noise = 1.0;
 auto spread = 4.0;
 auto s1 = generate_spiral(samples, noise, spread, rng);
 auto s2 = invert_points(generate_spiral(samples, noise, spread, rng));
 
 for (auto const& p : s1)
 std::cout << p.x << " " << p.y << " " << "1" << "\n";
 
 for (auto const& p : s2)
 std::cout << p.x << " " << p.y << " " << "0" << "\n";
}

Note the range-based for loops using auto make the output code much less verbose.

Note that << std::endl flushes the output as well as adding a new line. Since we don't need the flushing behavior, we should use << "\n" instead.

Question 15

This is more an amendment to Edward's answer, but it's a bit too long to do in comments.

In order to seed the pseudorandom number generator (PRNG), you should provide sufficient data to initialise the PRNG state.

The common (and I would argue obvious) approach of

std::random_device rd;
std::mt19937 gen(rd());

would be nice if it worked but this is C++ where the obvious way of doing things is often wrong. The reason this is insufficient is that operator() on std::random_device returns an unsigned int, which is permitted to be as small as 16 bits (but will probably be 32 bits on common systems). Meanwhile, the std::mt19937 PRNG has an internal state of 19937 bits, which suggests the above approach provides much too little data.

The approach the standard library takes is to use std::seed_seq, which takes a sequence of numbers, combining the lower 32 bits of each. In order to provide this, we can use a vector:

std::vector<std::uint_least32_t> seed_data;

We also need to get std::uint_least32_t out of the random device that supplies unsigned int (which might be only 16 bits), so we use a uniform distribution:

std::uniform_int_distribution<std::uint_least32_t> rd_dist;

As an aside: this will provide a distribution over the whole range of std::uint_least32_t: if you're worried about "wasting entropy" on a hypothetical system where this type is wider than 32 bits you could explicitly specify the range and hope that the provided implementation takes advantage of this to pull less data from the std::random_device:

std::uniform_int_distribution<std::uint_least32_t> rd_dist{ 0, UINT32_C(0xffffffff) };

The above is almost certainly unnecessary though.

The amount of data we need to fill this vector can be determined from properties on the std::mt19937 type. Unfortunately C++ does not directly provide a property that gets this directly, but it can be computed as follows:

constexpr std::size_t seed_size = std::mt19937::state_size * (std::mt19937::word_size / 32);

If you decide to replace std::mt19937 with std::mt19937_64 or another instantiation of the std::mersenne_twister_engine template, just replace the std::mt19937 in the above line and it should still work.

So we then fill the vector:

seed_data.reserve(seed_size);
std::generate_n(std::back_inserter(seed_data), seed_size, [&]() { return rd_dist(rd); });

Then create the seed sequence and create the generator:

std::seed_seq seed(seed_data.cbegin(), seed_data.cend());
std::mt19937 gen{ seed };

Putting that all together:

#include <algorithm>
#include <cstddef>
#include <cstdint>
#include <iterator>
#include <random>
#include <vector>
/* ... */
std::random_device rd;
std::uniform_int_distribution<std::uint_least32_t> rd_dist;
std::vector<std::uint_least32_t> seed_data;
constexpr std::size_t seed_size = std::mt19937::state_size * (std::mt19937::word_size / 32);
seed_data.reserve(seed_size);
std::generate_n(std::back_inserter(seed_data), seed_size, [&]() { return rd_dist(rd); });
std::seed_seq(seed_data.cbegin(), seed_data.cend());
std::mt19937 gen{ seed_seq };

I'm sure most people would agree that's a perfectly succinct and obvious way to initialise a PRNG. 🙃

As noted by Edward in the comments to their answer, this is still not ideal due to issues in the way std::seed_seq combines its values, and the fact that in their wisdom the C++ folks decided that it was perfectly ok to have std::random_device be deterministic (rather than, say, being a compile error on systems that don't provide a sufficient source of entropy, which would alert the programmers to the issue early on rather than being a potential source of runtime bugs).

Nevertheless, the general principle you should follow is to provide sufficient entropy to match the state of the PRNG, rather than just a single unsigned int.

Question 16

This is a valuable contribution. I am going to see about basing a question on this in the next few days. Thanks very much for taking the time to write it up!

Edward Edward 67.2k4 gold badges120 silver badges284 bronze badges · Accepted Answer · 2021-01-02 12:47:05Z

Here are some things that may help you improve your program.

Separate interface from implementation

The interface goes into a header file and the implementation (that is, everything that actually emits bytes including all functions and data) should be in a separate .cpp file. The reason is that you might have multiple source files including the .h file but only one instance of the corresponding .cpp file. In other words, split your existing data.hpp and util.hpp files into a .h file and a .cpp file.

Understand the use of namespaces

Your code is correctly using <cmath> instead of <math.h>. The difference between the two forms is that the former defines things within the std:: namespace versus into the global namespace. So this implies that instead of using cos and sin, the code should be using std::cos and std::sin

Use `const` where practical

There are a number of named constants in the program such as unsigned int n_dims = 2;. It's very good practice to name these, but even better is to declare them const or better still constexpr. The reason is that it allows the compiler to perform better optimizations without sacrificing readability.

Don't use `std::endl` if you don't really need it

The difference betweeen std::endl and '\n' is that '\n' just emits a newline character, while std::endl actually flushes the stream. This can be time-consuming in a program with a lot of I/O and is rarely actually needed. It's best to only use std::endl when you have some good reason to flush the stream and it's not very often needed for simple programs such as this one. Avoiding the habit of using std::endl when '\n' will do will pay dividends in the future as you write more complex programs with more I/O and where performance needs to be maximized.

Use "range `for`" and simplify your code

The code currently includes these lines:

for (std::vector<Point>::iterator it = d.begin(); it != d.end(); ++it) {
 std::cout << it->x << " " << it->y << " " << it->label << std::endl;
}

Since C++11, however, we have a range for which simplifies it to this:

for (const auto& point : d) {
 std::cout << point.x << " " << point.y << " " << point.label << '\n';
}

Write a stream inserter for your class

We can further refine the for loop above by creating a custom inserter for the Point class:

std::ostream& operator<<(std::ostream& out, const Point& point) {
 return out << point.x << " " << point.y << " " << point.label;
}

Now our loop in main is even simpler:

for (const auto& point : d) {
 std::cout << point << '\n';
}

Use standard library algorithms

We can use std::copy to copy from the vector to std::cout, replacing the loop with a single statement:

std::copy(d.begin(), d.end(), std::ostream_iterator<Point>(std::cout, "\n"));

Consider using a better random number generator

If you are using a compiler that supports at least C++11, consider using a better random number generator. In particular, instead of rand, you might want to look at std::uniform_real_distribution and friends in the <random> header. Also, instead of doing this:

if (random.rand() < 0.5) {
 it->x = r1;
 it->y = r2;
 it->label = 1;
} else {
 it->x = -r1;
 it->y = -r2;
 it->label = 0;
}

I would suggest using a std::bernoulli_distribution.

Rethink the classes

Rather than having the vaguely named Data class creating Points, I'd suggest moving that functionality into the Point class. Since this is actually creating a Point, it makes sense to me to create a custom constructor with this functionality:

Point::Point() {
 static std::random_device rd;
 static std::mt19937 gen(rd());
 static std::uniform_real_distribution<> spread(0.0, 4.0*3.14);
 static std::uniform_real_distribution<> noise(0.0, 1.0);
 static std::bernoulli_distribution type(0.5);
 static constexpr double multiplier[]{-1.0, +1.0};
 label = type(gen);
 double r = spread(gen);
 x = multiplier[label] * (-std::cos(r) * r + noise(gen));
 y = multiplier[label] * (+std::sin(r) * r + noise(gen));
}

Minimize memory use

It's not critical for this program, but rather than storing everything in a vector only to print it and then destroy the vector, a more memory efficient approach would be to generate each point as needed and emit it directly. Using the Point constructor and stream inserter as defined above, this becomes trivial.

Result

Here's the entire program as rewritten with all of these ideas:

#include <random>
#include <iostream>
#include <iterator>
#include <cmath>
struct Point {
 Point();
 double x;
 double y;
 unsigned int label;
};
Point::Point() {
 static std::random_device rd;
 static std::mt19937 gen(rd());
 static std::uniform_real_distribution<> spread(0.0, 4.0*3.14);
 static std::uniform_real_distribution<> noise(0.0, 1.0);
 static std::bernoulli_distribution type(0.5);
 static constexpr double multiplier[]{-1.0, +1.0};
 label = type(gen);
 double r = spread(gen);
 x = multiplier[label] * (-std::cos(r) * r + noise(gen));
 y = multiplier[label] * (+std::sin(r) * r + noise(gen));
}
std::ostream& operator<<(std::ostream& out, const Point& point) {
 return out << point.x << " " << point.y << " " << point.label;
}
int main() {
 for (auto i{3000}; i; --i) {
 std::cout << Point() << '\n';
 }
}

Rather than for (auto i{3000}; i; --i) ..., you could end with std::generate_n(std::ostream_iterator<Point>(std::cout, "\n"), 3000, []{ return Point(); })'
@Caleth: True, and I considered it but opted for the simpler version shown.
You are seeding the pseudorandom number generator with a single unsigned int from the random device. The state size of std::mt19937 is 19937 bits, so it is unlikely (although technically permitted by the standard) that a single unsigned int will be enough: it will work but will only allow access to a tiny subset of the possible initial states. It would be better to use a std::seed_seq instead.
@antispinwards: you're right that the current seed is not very good. However, I'm not sure how to do better, even with std::seed_seq. See this for example. Ideas on how to better address this would be most welcome!
@Edward - yes, std::seed_seq is less than ideal (barring the C++ folks having a fit of sanity and bugfixing it in a way that doing the correct thing with sufficient data is given priority over doing something with insufficient data), but that doesn't mean we should just throw up our hands and seed the generator in a way that can provide at most 2^32 possibilities (for common definitions of unsigned int) out of 2^19937. 2^32 isn't a particularly large number these days for this kind of thing.

Stack Exchange Network

Simple dataset class in C++

5 Answers 5

Separate interface from implementation

Understand the use of namespaces

Use `const` where practical

Don't use `std::endl` if you don't really need it

Use "range `for`" and simplify your code

Write a stream inserter for your class

Use standard library algorithms

Consider using a better random number generator

Rethink the classes

Minimize memory use

Result

Unnecessary Class

Readability Review

Use good namings

Nitpicks

Perspective

Underlying problem

The moving parts

The random variable description language

Implementation guidelines

Importance of user provided data structures

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Simple dataset class in C++

5 Answers 5

Separate interface from implementation

Understand the use of namespaces

Use const where practical

Don't use std::endl if you don't really need it

Use "range for" and simplify your code

Write a stream inserter for your class

Use standard library algorithms

Consider using a better random number generator

Rethink the classes

Minimize memory use

Result

Unnecessary Class

Readability Review

Use good namings

Nitpicks

Perspective

Underlying problem

The moving parts

The random variable description language

Implementation guidelines

Importance of user provided data structures

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

Use `const` where practical

Don't use `std::endl` if you don't really need it

Use "range `for`" and simplify your code