Although having very little C++ experience, I was tasked to implement several statistics objects that are safe to be asynchronous accessed by different threads, that collect data in relatively small memory caches and do the math when asked. Many of those I need to write are based on averaging, so help me by inspecting basic averaging class. I just hope to avoid any strategic errors in design of my cargo-cult code-gluing efforts.
#include <mutex>
template <typename T, int N> class Average {
T data[N];
int head;
double avg;
int itr;
int counter;
std::mutex _mutex;
public:
Average() : head(0), avg(0.), itr(0), counter(0) {
};
void empty() {
std::unique_lock<std::mutex> lock(this->_mutex);
head = 0;
avg = 0.;
itr = 0;
counter = 0;
}
int count() {
std::unique_lock<std::mutex> lock(this->_mutex);
return counter;
}
void push(T value) {
std::unique_lock<std::mutex> lock(this->_mutex);
counter++;
data [head] = value;
double tmp = 0.;
if (++head == N) {
head = 0;
for (int i = 0; i < N; i++) {
tmp = tmp + data[i];
}
tmp = tmp / N;
avg = (avg * itr) + tmp;
itr += 1;
avg = avg / itr;
}
}
double average() {
std::unique_lock<std::mutex> lock(this->_mutex);
double tmp = 0.;
if (head == 0){
if (itr == 0) return 0.0;
return avg;
}
for (int i = 0; i < head; i++) {
tmp = tmp + data[i];
}
if (itr == 0) return tmp / head;
tmp = tmp/N;
return ((avg * itr) + tmp) / ((double)itr + (double)head/(double)N);
}
};
As an usage example, by extending the Average class, I implemented a RunningAverage class which averages only let say last N pushed values. I got a worker pool that perform Monte Carlo simulation where it is preferred to maintain step acceptance rate 0.5+/-0.1 and when rate falls outside, I adjust model parameters. So I got something like:
#define ACCEPT_STEP 1
#define DECLINE_STEP 0
RunningAverage <float, ACC_BUFFER_LEN> global_acceptance_rate;
and thread do something like:
global_acceptance_rate.push(ACCEPT_STEP); // or
global_acceptance_rate.push(DECLINE_STEP);
So this allows me to dynamically control simulation.
1 Answer 1
Code
- Prefer
std::array
over plain C-style arrays. It has more normal copy / assignment semantics (which admittedly isn't relevant here), but also allows you to access the size directly from the array. Naming:
head
is usually used for the first populated element (of a list, for example). In this case it refers to the first empty space instead, sonext
might be a more appropriate name.avg
,itr
: prefer full names over abbreviations._mutex
: the underscore prefix is usually used to denote a member, but this is the only member that has one?sum
would be a better name thantmp
. It's also better to declare variables as close as possible to the point of use.double sum = 0.; for (auto i : data) // range-based for loop for brevity sum += i;
Sum loops like this may also be written using the
std::accumulate
algorithm:#include <numeric> ... using std::begin; using std::end; auto sum = std::accumulate(begin(data), end(data), 0.0);
When referring to
_mutex
in member functions, the code usesthis->_mutex
. Which rather defeats the point of using the underscore prefix. No other members are referred to withthis->
.The
itr
andhead
variables are unnecessary, as they can both be trivially calculated:itr = counter / N; head = counter % N;
Getter functions, e.g.
count()
,average()
should beconst
. Note that for this to compile, the mutex member must then be mademutable
(one of the few instances wheremutable
is to be recommended).
Design
This appears to be a Cumulative Moving Average. Data storage is actually unnecessary for such an average! (Which can be seen in the existing code since the data is periodically overwritten. There is no advantage in creating an Average<double, 1297>
over an Average<double, 1>
. The average()
function will return the same result for the same data regardless.)
Since the data is private and cannot be accessed externally, it becomes irrelevant... As such, most of the code can be deleted! :)
class CumulativeMovingAverage
{
std::size_t n;
double average;
public:
CumulativeMovingAverage(): n(0u), average(0.0) { }
std::size_t get_count() const
{
return n;
}
double get_average() const
{
return average;
}
void update(double x)
{
// formula straight from wikipedia:
//average = (x + n * average) / (n + 1u);
//++n;
// improved formula from Toby Speight (see comments):
++n;
average += (x - average) / n;
}
};
Thread safety / templateyness is left as an exercise to the reader (the original code looks correct to me).
-
1\$\begingroup\$ Actually, a cumulative mean can be updated with
++n; average += (x - average) / n;
(fewer operations; less risk of exceeding range). (ref) (example) \$\endgroup\$Toby Speight– Toby Speight2018年08月23日 09:42:07 +00:00Commented Aug 23, 2018 at 9:42 -
1\$\begingroup\$ Also, you might want to mention that the
sum +=
loop can be written usingstd::accumulate()
instead, before we dispense with it completely. \$\endgroup\$Toby Speight– Toby Speight2018年08月23日 09:44:17 +00:00Commented Aug 23, 2018 at 9:44 -
1\$\begingroup\$
Prefer std::array over plain C-style arrays.
Could you elaborate on this point? Maybe add reasons why? Seems a bit like cargo-cult otherwise. \$\endgroup\$yuri– yuri2018年08月23日 16:54:17 +00:00Commented Aug 23, 2018 at 16:54
count()
? It doesn’t modify anything. Is it dangerous to read a variable while it is being updated elsewhere? I’m just curious... \$\endgroup\$int32_t
have been updated, bytes 3 and 4 haven't yet), which could cause problems. Thestd::unique_lock
would prevent that from happening. (It isn't an issue forint
on x86 platforms, but that cannot be generalized to all platforms.) \$\endgroup\$