Giving the following constraints, is the optimization in this code correct?
- before an object is transfered to another thread, its reference count is incremented and
is_shared
is set to true. - everytime an object reaches a reference count of 1,
is_shared
is set to false. Header_is_unique
is 100% thread safe.
The idea behind the optimization is to try reduce the amount of atomic operations.
Is this a fool's errand? Is this kind of optimization impossible to do in a thread safe manner without using more atomic values and/or mutexes, thus invalidating the idea of the optimization itself?
I only want to target the GCC and Clang compiler.
#include <stdint.h>
typedef struct Header {
uint8_t is_shared;
uint64_t reference_count;
} Header;
void
Header_acquire(
Header * const self
) {
if (!self->is_shared) {
++self->reference_count;
} else {
__atomic_fetch_add(&self->reference_count, 1, __ATOMIC_RELAXED);
}
}
uint8_t
Header_release(
Header * const self
) {
if (!self->is_shared) {
return !--self->reference_count;
}
uint64_t const reference_count =
__atomic_fetch_sub(&self->reference_count, 1, __ATOMIC_RELEASE);
if (reference_count == 2) {
__atomic_thread_fence(__ATOMIC_ACQUIRE);
self->is_shared = 0;
}
return reference_count == 1;
}
uint8_t
Header_is_unique(
Header const * const self
) {
return !self->is_shared && self->reference_count == 1;
}
1 Answer 1
Use C11 atomic types and functions
Since C11, support for atomic types and operations has been added to the standard. Prefer to use those over non-atomic types and non-standard functions.
Be careful combining multiple atomic operations
It's a mistake to think that since you use atomic operations on the members of Header
, that higher-level operations on Header
, liking acquiring and releasing, are also atomic. Consider:
if (!self->is_shared) {
return !--self->reference_count;
}
Between the load !self->is_shared
and the decrement --self->reference_count
, is_shared
could potentially have been modified.
The same goes for checking the reference count and setting is_shared
to zero, later on in that function, and in Header_is_unique()
.
Avoid premature optimization
The idea behind the optimization is to try reduce the amount of atomic operations.
You added a conditional check to Header_release()
and Header_acquire()
that you pay for every time you call those functions. And if it's not shared, consider that no other threads are simulatenously trying to access that Header
object, so the atomic operations are likely to be cheap.
Measure the performance in a real scenario, with and without is_shared
, and then decide whether the added complication is worth it.
is_shared
ever becomes non-zero. \$\endgroup\$if (reference_count == 2)
mean that now only single thread have reference to self, but this is *ANOTHER thread, not current. look more at stackoverflow.com/a/79320186/6401656 \$\endgroup\$