I have a training dataset from which I want to draw samples in a random fashion such that all samples are used before getting randomly shuffled again. For this reason I implemented a simple random index generator.
For a dataset of 10 samples, the output looks something like this:
0 1 2 3 4 5 6 7 8 9
4 3 7 8 0 5 2 1 6 9
0 5 7 8 4 3 9 2 1 6
5 7 6 3 8 4 2 0 1 9
4 6 0 2 8 1 3 9 5 7
4 0 5 1 7 9 6 2 8 3
3 8 5 6 1 7 2 4 0 9
0 4 6 2 9 5 8 3 1 7
1 3 6 8 2 7 5 9 0 4
5 1 7 9 8 0 6 4 2 3
I would appreciate advice especially in the following areas:
- Code style (readability, naming conventions, etc...)
- Class design
- Efficiency (how to avoid unnecessary complexity)
- Reinventing the wheel (does the STL offer functionality that I should use?)
- Are there perhaps bugs that I am not seeing right now?
Please be as hard as possible with this implementation and give me constructive feedback.
main.cpp
#include <iostream>
#include "random_index.hpp"
int main() {
unsigned int size = 10;
RandomIndex rand_idx(size);
unsigned int n = 0;
for (unsigned int i=0; i<100; ++i, ++n) {
std::cout << rand_idx.get_index() << ' ';
if ((n+1) % size == 0) {
std::cout << '\n';
}
}
std::cout << '\n';
}
random_index.cpp
#include "random_index.hpp"
RandomIndex::RandomIndex(unsigned int _size) {
size = _size;
index.resize(_size, 0);
std::iota(index.begin(), index.end(), 0);
}
unsigned int RandomIndex::get_index() {
if (counter < size) {
return index[counter++];
} else {
counter = 0;
std::random_shuffle(index.begin(), index.end());
return index[counter++];
}
}
random_index.hpp
#ifndef RANDOM_INDEX_H
#define RANDOM_INDEX_H
#include <vector>
#include <numeric>
#include <algorithm>
class RandomIndex {
public:
RandomIndex(unsigned int _size);
unsigned int get_index();
private:
unsigned int size;
unsigned int counter = 0;
std::vector<unsigned int> index;
};
#endif
I compiled the code using the following command:
g++ -O -Wall main.cpp random_index.cpp
1 Answer 1
Overview
Your code produces the exact same set of samples each time you random_shuffle
. So the output is always
0 1 2 3 4 5 6 7 8 9
4 3 7 8 0 5 2 1 6 9
0 5 7 8 4 3 9 2 1 6
5 7 6 3 8 4 2 0 1 9
4 6 0 2 8 1 3 9 5 7
4 0 5 1 7 9 6 2 8 3
3 8 5 6 1 7 2 4 0 9
0 4 6 2 9 5 8 3 1 7
1 3 6 8 2 7 5 9 0 4
5 1 7 9 8 0 6 4 2 3
You should initialize the seed. You do that before you call random_shuffle
. If you do not do that it will give the same result each time since it relies on the seed.
You can fix that by calling std::srand
in the beginning.
int main() {
std::srand(std::time(0));
}
Now you can see the difference in the output
0 1 2 3 4 5 6 7 8 9
9 2 8 3 6 4 0 7 1 5
8 9 4 6 7 2 1 5 0 3
4 6 0 2 3 1 5 8 9 7
2 3 0 1 6 8 9 5 7 4
8 9 6 5 2 1 3 7 4 0
9 3 2 8 6 7 5 0 4 1
8 1 2 5 9 6 4 3 0 7
7 1 4 6 8 9 3 0 5 2
3 1 9 4 2 7 6 0 5 8
But the top row still will always be the same. The problem lies here
RandomIndex::RandomIndex(unsigned int _size) {
size = _size;
index.resize(_size, 0);
std::iota(index.begin(), index.end(), 0);
}
iota
will always will it starting with 0
and incrementing it will it reaches index.end()
. get_index
will only shuffle the container after the first row since that's when counter < size
evaluates to false.
To fix that you can also shuffle in the beginning, when you construct the vector.
Use std::shuffle
random_shuffle
has been deprecated. Use shuffle
with a random number generator so that your code can compile on the later c++ versions.
Avoid using _
as a prefix for your variables
There are some naming conventions you need to follow when it comes to underscores to avoid collisions. Moreover, it just looks ugly. To initialize size
in your class constructor you are better off using member initializer lists.
RandomIndex::RandomIndex(unsigned int size)
: size(size) {
//...
}
use an array
with a fixed size over vector
In your class size
is a value that is never going to change. If you plan on keeping it that way you can, use std::array
here with a template so you can have a fixed size. This will avoid all the resizing and heap fiddling resulting in faster execution. std::shuffle
will work the same way it does with vector
thanks to the way STL is designed.
template < size_t s >
class RandomIndex {
//...
private:
std::array < uint32_t, s > index;
};
std::sample
instead. \$\endgroup\$