My code: https://ideone.com/DZeIZv
#include <atomic>
template <class T> class Seqlock {
std::atomic<int> seq_;
T val_;
public:
Seqlock(T value = T())
: val_(value) {
}
// concurrent calls are NOT allowed
void store(T value) {
const int seq0 = seq_.load(std::memory_order_relaxed);
seq_.store(seq0 + 1, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_release);
val_ = value;
std::atomic_thread_fence(std::memory_order_release);
seq_.store(seq0 + 2, std::memory_order_relaxed);
}
// concurrent calls are allowed
T load() const {
for (;;) {
const int seq0 = seq_.load(std::memory_order_relaxed);
if (seq0 & 1) {
// cpu_relax()
continue;
}
std::atomic_thread_fence(std::memory_order_acquire);
T ret = val_;
std::atomic_thread_fence(std::memory_order_acquire);
const int seq1 = seq_.load(std::memory_order_relaxed);
if (seq0 == seq1) {
return ret;
}
}
}
};
Is this seqlock implementation correct across all architectures, at least when T is an integral type? Can it be improved?
References I was following:
1 Answer 1
My guideline on memory orders is "Just say no"; I guarantee that if you're using them, you're using them wrong. :) So I won't attempt to find the exact bug, I'll just assume they all say seq_cst
. The only thing I'll say about your memory orders is, when you write
seq_.store(seq0 + 1, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_release);
can you explain how that's different from
seq_.store(seq0 + 1, std::memory_order_release);
?
Seqlock(T value = T())
This constructor should be explicit
; otherwise you're accidentally permitting
Seqlock<int> s = 42;
and in fact because of C++17 CTAD you're also permitting
Seqlock s = 42;
Rule of thumb: make all constructors explicit
except for those you want the compiler to be able to call implicitly (i.e., copy and move constructors).
In T load() const
, you load seq1
at the bottom of the loop, and then go around again and immediately load seq0
from the same location. You could just have set seq0 = seq1
; i.e.
T load() const {
int seq1 = seq_.load(std::memory_order_relaxed);
while (true) {
int seq0 = seq1;
if (seq0 & 1) {
// cpu_relax();
} else {
std::atomic_thread_fence(std::memory_order_acquire);
T ret = val_;
std::atomic_thread_fence(std::memory_order_acquire);
seq1 = seq_.load(std::memory_order_relaxed);
if (seq0 == seq1) {
return ret;
}
}
}
}
-
\$\begingroup\$ Thanks for your reply. "Just say no" to barriers is a good approach, but sometimes we can't. In may case, I need to implement an alternative to 64-bit atomics on 32-bit platforms, and using seqlocks is a commonly used solution for this. \$\endgroup\$gavv– gavv2020年05月25日 07:32:39 +00:00Commented May 25, 2020 at 7:32
-
\$\begingroup\$ atomic_thread_fence() is different from passing MO to store() in the scope to which the MO applies: in the first case it applies to all stores and loads of all variables, and in the second case it's applied only to specific variable. This is what the spec says, but unfortunately I don't know the exact difference it the instructions produced by compiler. \$\endgroup\$gavv– gavv2020年05月25日 07:33:13 +00:00Commented May 25, 2020 at 7:33
-
\$\begingroup\$ FWIW, I didn't "just say no" to barriers (inter-thread synchronization) in general; just to using memory orders to get there. Reasoning about
seq_cst
atomics is difficult but possible. It's when you start throwing in orders likeacquire
andrelease
that it gets inhumanly confusing. \$\endgroup\$Quuxplusone– Quuxplusone2020年05月25日 14:31:51 +00:00Commented May 25, 2020 at 14:31 -
\$\begingroup\$ Re the difference between "applies to all stores and loads of all variables" and "applied only to specific variable" — yeah, that's what I meant. Can you explain the difference between these two in this situation? It seems like the only non-atomic variable involved is
val_
, which is overwritten right after the fence. Hm, since you don't want the hardware to hoist the store toval_
before the increment ofseq_
, it feels like maybe you want anacquire
there, not arelease
. ...This is the confusing part that I recommend saying "no" to. Justseq_cst
! It's the default for a reason. \$\endgroup\$Quuxplusone– Quuxplusone2020年05月25日 14:36:04 +00:00Commented May 25, 2020 at 14:36 -
\$\begingroup\$
FWIW, I didn't "just say no" to barriers (inter-thread synchronization) in general; just to using memory orders to get there.
Thanks, this is indeed a good advice which I will follow in higher level code. However, using 2 or 3 full fences in such a low-level thing as seqlock (i.e. small in one hand and widely used on the other hand) clearly would be an overkill. \$\endgroup\$gavv– gavv2020年05月31日 17:53:47 +00:00Commented May 31, 2020 at 17:53