C++ thread-safe averaging

Question 1

Although having very little C++ experience, I was tasked to implement several statistics objects that are safe to be asynchronous accessed by different threads, that collect data in relatively small memory caches and do the math when asked. Many of those I need to write are based on averaging, so help me by inspecting basic averaging class. I just hope to avoid any strategic errors in design of my cargo-cult code-gluing efforts.

#include <mutex>
template <typename T, int N> class Average {
 T data[N];
 int head;
 double avg;
 int itr;
 int counter;
 std::mutex _mutex;
public:
Average() : head(0), avg(0.), itr(0), counter(0) {
};
void empty() {
 std::unique_lock<std::mutex> lock(this->_mutex);
 head = 0;
 avg = 0.;
 itr = 0;
 counter = 0;
}
int count() {
 std::unique_lock<std::mutex> lock(this->_mutex);
 return counter;
}
void push(T value) {
 std::unique_lock<std::mutex> lock(this->_mutex);
 counter++;
 data [head] = value;
 double tmp = 0.;
 if (++head == N) {
 head = 0;
 for (int i = 0; i < N; i++) {
 tmp = tmp + data[i];
 }
 tmp = tmp / N;
 avg = (avg * itr) + tmp;
 itr += 1;
 avg = avg / itr;
 }
}
double average() {
 std::unique_lock<std::mutex> lock(this->_mutex);
 double tmp = 0.;
 if (head == 0){
 if (itr == 0) return 0.0;
 return avg;
 }
 for (int i = 0; i < head; i++) {
 tmp = tmp + data[i];
 }
 if (itr == 0) return tmp / head;
 tmp = tmp/N;
 return ((avg * itr) + tmp) / ((double)itr + (double)head/(double)N);
 }
};

As an usage example, by extending the Average class, I implemented a RunningAverage class which averages only let say last N pushed values. I got a worker pool that perform Monte Carlo simulation where it is preferred to maintain step acceptance rate 0.5+/-0.1 and when rate falls outside, I adjust model parameters. So I got something like:

#define ACCEPT_STEP 1
#define DECLINE_STEP 0
RunningAverage <float, ACC_BUFFER_LEN> global_acceptance_rate;

and thread do something like:

global_acceptance_rate.push(ACCEPT_STEP); // or
global_acceptance_rate.push(DECLINE_STEP);

So this allows me to dynamically control simulation.

Question 2

Could you add an example of how to use it?

Question 3

I’m not experienced with asynchronous access. What is the purpose of the mutex in count()? It doesn’t modify anything. Is it dangerous to read a variable while it is being updated elsewhere? I’m just curious...

Question 4

@CrisLuengo: Technically, on some platforms you might be able to read a partially updated value (e.g. bytes 1 and 2 of an int32_t have been updated, bytes 3 and 4 haven't yet), which could cause problems. The std::unique_lock would prevent that from happening. (It isn't an issue for int on x86 platforms, but that cannot be generalized to all platforms.)

Question 5

Code

Prefer std::array over plain C-style arrays. It has more normal copy / assignment semantics (which admittedly isn't relevant here), but also allows you to access the size directly from the array.
Naming:
- head is usually used for the first populated element (of a list, for example). In this case it refers to the first empty space instead, so next might be a more appropriate name.
- avg, itr: prefer full names over abbreviations.
- _mutex: the underscore prefix is usually used to denote a member, but this is the only member that has one?
- sum would be a better name than tmp. It's also better to declare variables as close as possible to the point of use.
```
double sum = 0.;
for (auto i : data) // range-based for loop for brevity
 sum += i;
```

Sum loops like this may also be written using the std::accumulate algorithm:

 #include <numeric>
 ...
 using std::begin;
 using std::end;
 auto sum = std::accumulate(begin(data), end(data), 0.0);

When referring to _mutex in member functions, the code uses this->_mutex. Which rather defeats the point of using the underscore prefix. No other members are referred to with this->.
The itr and head variables are unnecessary, as they can both be trivially calculated: itr = counter / N; head = counter % N;
Getter functions, e.g.count(), average() should be const. Note that for this to compile, the mutex member must then be made mutable (one of the few instances where mutable is to be recommended).

Design

This appears to be a Cumulative Moving Average. Data storage is actually unnecessary for such an average! (Which can be seen in the existing code since the data is periodically overwritten. There is no advantage in creating an Average<double, 1297> over an Average<double, 1>. The average() function will return the same result for the same data regardless.)

Since the data is private and cannot be accessed externally, it becomes irrelevant... As such, most of the code can be deleted! :)

class CumulativeMovingAverage
{
 std::size_t n;
 double average;
public:
 CumulativeMovingAverage(): n(0u), average(0.0) { }
 std::size_t get_count() const
 {
 return n;
 }
 double get_average() const
 {
 return average;
 }
 void update(double x)
 {
 // formula straight from wikipedia:
 //average = (x + n * average) / (n + 1u);
 //++n;
 // improved formula from Toby Speight (see comments):
 ++n;
 average += (x - average) / n;
 }
};

Thread safety / templateyness is left as an exercise to the reader (the original code looks correct to me).

Question 6

Actually, a cumulative mean can be updated with ++n; average += (x - average) / n; (fewer operations; less risk of exceeding range). (ref) (example)

Question 7

Also, you might want to mention that the sum += loop can be written using std::accumulate() instead, before we dispense with it completely.

Question 8

Prefer std::array over plain C-style arrays. Could you elaborate on this point? Maybe add reasons why? Seems a bit like cargo-cult otherwise.

user673679 user673679 12.2k2 gold badges34 silver badges65 bronze badges · Accepted Answer · 2018-08-23 08:36:38Z

Code

Prefer std::array over plain C-style arrays. It has more normal copy / assignment semantics (which admittedly isn't relevant here), but also allows you to access the size directly from the array.
Naming:
- head is usually used for the first populated element (of a list, for example). In this case it refers to the first empty space instead, so next might be a more appropriate name.
- avg, itr: prefer full names over abbreviations.
- _mutex: the underscore prefix is usually used to denote a member, but this is the only member that has one?
- sum would be a better name than tmp. It's also better to declare variables as close as possible to the point of use.
```
double sum = 0.;
for (auto i : data) // range-based for loop for brevity
 sum += i;
```

Sum loops like this may also be written using the std::accumulate algorithm:

 #include <numeric>
 ...
 using std::begin;
 using std::end;
 auto sum = std::accumulate(begin(data), end(data), 0.0);

When referring to _mutex in member functions, the code uses this->_mutex. Which rather defeats the point of using the underscore prefix. No other members are referred to with this->.
The itr and head variables are unnecessary, as they can both be trivially calculated: itr = counter / N; head = counter % N;
Getter functions, e.g.count(), average() should be const. Note that for this to compile, the mutex member must then be made mutable (one of the few instances where mutable is to be recommended).

Design

This appears to be a Cumulative Moving Average. Data storage is actually unnecessary for such an average! (Which can be seen in the existing code since the data is periodically overwritten. There is no advantage in creating an Average<double, 1297> over an Average<double, 1>. The average() function will return the same result for the same data regardless.)

Since the data is private and cannot be accessed externally, it becomes irrelevant... As such, most of the code can be deleted! :)

class CumulativeMovingAverage
{
 std::size_t n;
 double average;
public:
 CumulativeMovingAverage(): n(0u), average(0.0) { }
 std::size_t get_count() const
 {
 return n;
 }
 double get_average() const
 {
 return average;
 }
 void update(double x)
 {
 // formula straight from wikipedia:
 //average = (x + n * average) / (n + 1u);
 //++n;
 // improved formula from Toby Speight (see comments):
 ++n;
 average += (x - average) / n;
 }
};

Thread safety / templateyness is left as an exercise to the reader (the original code looks correct to me).

Actually, a cumulative mean can be updated with ++n; average += (x - average) / n; (fewer operations; less risk of exceeding range). (ref) (example)
Also, you might want to mention that the sum += loop can be written using std::accumulate() instead, before we dispense with it completely.
Prefer std::array over plain C-style arrays. Could you elaborate on this point? Maybe add reasons why? Seems a bit like cargo-cult otherwise.

Stack Exchange Network

C++ thread-safe averaging

1 Answer 1

Code

Design

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

C++ thread-safe averaging

1 Answer 1

Code

Design

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions