Is this C++11 seqlock implementation correct?

Question 1

#include <atomic>
template <class T> class Seqlock {
 std::atomic<int> seq_;
 T val_;
public:
 Seqlock(T value = T())
 : val_(value) {
 }
 // concurrent calls are NOT allowed
 void store(T value) {
 const int seq0 = seq_.load(std::memory_order_relaxed);
 seq_.store(seq0 + 1, std::memory_order_relaxed);
 std::atomic_thread_fence(std::memory_order_release);
 val_ = value;
 std::atomic_thread_fence(std::memory_order_release);
 seq_.store(seq0 + 2, std::memory_order_relaxed);
 }
 // concurrent calls are allowed
 T load() const {
 for (;;) {
 const int seq0 = seq_.load(std::memory_order_relaxed);
 if (seq0 & 1) {
 // cpu_relax()
 continue;
 }
 std::atomic_thread_fence(std::memory_order_acquire);
 T ret = val_;
 std::atomic_thread_fence(std::memory_order_acquire);
 const int seq1 = seq_.load(std::memory_order_relaxed);
 if (seq0 == seq1) {
 return ret;
 }
 }
 }
};

Is this seqlock implementation correct across all architectures, at least when T is an integral type? Can it be improved?

References I was following:

Question 2

My guideline on memory orders is "Just say no"; I guarantee that if you're using them, you're using them wrong. :) So I won't attempt to find the exact bug, I'll just assume they all say seq_cst. The only thing I'll say about your memory orders is, when you write

 seq_.store(seq0 + 1, std::memory_order_relaxed);
 std::atomic_thread_fence(std::memory_order_release);

can you explain how that's different from

 seq_.store(seq0 + 1, std::memory_order_release);

?

Seqlock(T value = T())

This constructor should be explicit; otherwise you're accidentally permitting

Seqlock<int> s = 42;

and in fact because of C++17 CTAD you're also permitting

Seqlock s = 42;

Rule of thumb: make all constructors explicit except for those you want the compiler to be able to call implicitly (i.e., copy and move constructors).

In T load() const, you load seq1 at the bottom of the loop, and then go around again and immediately load seq0 from the same location. You could just have set seq0 = seq1; i.e.

T load() const {
 int seq1 = seq_.load(std::memory_order_relaxed);
 while (true) {
 int seq0 = seq1;
 if (seq0 & 1) {
 // cpu_relax();
 } else {
 std::atomic_thread_fence(std::memory_order_acquire);
 T ret = val_;
 std::atomic_thread_fence(std::memory_order_acquire);
 seq1 = seq_.load(std::memory_order_relaxed);
 if (seq0 == seq1) {
 return ret;
 }
 }
 }
}

Question 3

Thanks for your reply. "Just say no" to barriers is a good approach, but sometimes we can't. In may case, I need to implement an alternative to 64-bit atomics on 32-bit platforms, and using seqlocks is a commonly used solution for this.

Question 4

atomic_thread_fence() is different from passing MO to store() in the scope to which the MO applies: in the first case it applies to all stores and loads of all variables, and in the second case it's applied only to specific variable. This is what the spec says, but unfortunately I don't know the exact difference it the instructions produced by compiler.

Question 5

FWIW, I didn't "just say no" to barriers (inter-thread synchronization) in general; just to using memory orders to get there. Reasoning about seq_cst atomics is difficult but possible. It's when you start throwing in orders like acquire and release that it gets inhumanly confusing.

Question 6

Re the difference between "applies to all stores and loads of all variables" and "applied only to specific variable" — yeah, that's what I meant. Can you explain the difference between these two in this situation? It seems like the only non-atomic variable involved is val_, which is overwritten right after the fence. Hm, since you don't want the hardware to hoist the store to val_ before the increment of seq_, it feels like maybe you want an acquire there, not a release. ...This is the confusing part that I recommend saying "no" to. Just seq_cst! It's the default for a reason.

Question 7

FWIW, I didn't "just say no" to barriers (inter-thread synchronization) in general; just to using memory orders to get there.

Thanks, this is indeed a good advice which I will follow in higher level code. However, using 2 or 3 full fences in such a low-level thing as seqlock (i.e. small in one hand and widely used on the other hand) clearly would be an overkill.

Quuxplusone Quuxplusone 19.7k2 gold badges44 silver badges91 bronze badges · Answer 1 · 2020-05-11 15:34:05Z

My guideline on memory orders is "Just say no"; I guarantee that if you're using them, you're using them wrong. :) So I won't attempt to find the exact bug, I'll just assume they all say seq_cst. The only thing I'll say about your memory orders is, when you write

 seq_.store(seq0 + 1, std::memory_order_relaxed);
 std::atomic_thread_fence(std::memory_order_release);

can you explain how that's different from

 seq_.store(seq0 + 1, std::memory_order_release);

?

Seqlock(T value = T())

This constructor should be explicit; otherwise you're accidentally permitting

Seqlock<int> s = 42;

and in fact because of C++17 CTAD you're also permitting

Seqlock s = 42;

Rule of thumb: make all constructors explicit except for those you want the compiler to be able to call implicitly (i.e., copy and move constructors).

In T load() const, you load seq1 at the bottom of the loop, and then go around again and immediately load seq0 from the same location. You could just have set seq0 = seq1; i.e.

T load() const {
 int seq1 = seq_.load(std::memory_order_relaxed);
 while (true) {
 int seq0 = seq1;
 if (seq0 & 1) {
 // cpu_relax();
 } else {
 std::atomic_thread_fence(std::memory_order_acquire);
 T ret = val_;
 std::atomic_thread_fence(std::memory_order_acquire);
 seq1 = seq_.load(std::memory_order_relaxed);
 if (seq0 == seq1) {
 return ret;
 }
 }
 }
}

Thanks for your reply. "Just say no" to barriers is a good approach, but sometimes we can't. In may case, I need to implement an alternative to 64-bit atomics on 32-bit platforms, and using seqlocks is a commonly used solution for this.
atomic_thread_fence() is different from passing MO to store() in the scope to which the MO applies: in the first case it applies to all stores and loads of all variables, and in the second case it's applied only to specific variable. This is what the spec says, but unfortunately I don't know the exact difference it the instructions produced by compiler.
FWIW, I didn't "just say no" to barriers (inter-thread synchronization) in general; just to using memory orders to get there. Reasoning about seq_cst atomics is difficult but possible. It's when you start throwing in orders like acquire and release that it gets inhumanly confusing.
Re the difference between "applies to all stores and loads of all variables" and "applied only to specific variable" — yeah, that's what I meant. Can you explain the difference between these two in this situation? It seems like the only non-atomic variable involved is val_, which is overwritten right after the fence. Hm, since you don't want the hardware to hoist the store to val_ before the increment of seq_, it feels like maybe you want an acquire there, not a release. ...This is the confusing part that I recommend saying "no" to. Just seq_cst! It's the default for a reason.
FWIW, I didn't "just say no" to barriers (inter-thread synchronization) in general; just to using memory orders to get there. Thanks, this is indeed a good advice which I will follow in higher level code. However, using 2 or 3 full fences in such a low-level thing as seqlock (i.e. small in one hand and widely used on the other hand) clearly would be an overkill.

Stack Exchange Network

Is this C++11 seqlock implementation correct?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Is this C++11 seqlock implementation correct?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions