please review my implementation of a FIFO data structure that guarantees thread-safe access without locking.
I removed the license header block and all Doxygen comments. The original version (along with some tests) is available from here: https://github.com/fat-lobyte/atomic_queue
It should compile with
- Clang>= 3.1, with -stdlib=libc++
- GCC => 4.7, with -D_GLIBCXX_USE_SCHED_YIELD
- Visual Studio>= 2012 (_MSC_VER>= 1700), without emplace_back() member function
I would ask you to check if
- It actually is thread safe (this is the main concern)
- It is compliant with the C++11 standard
- It performs reasonably and no resources are wasted
- If its possible to implement it more simple without losing performance or flexibility
Additionally, if someone is a real pro in lock-free programming, they can help me answer the following question: Can I relax some of the load/store/exchange operations memory ordering constraints (pass an std::memory_order constant)? I do not understand the memory model properly, and quite frankly I find it is a really difficult to grasp.
Help would be very appreciated. Thanks in advance!
#ifndef ATOMIC_QUEUE_HPP_INCLUDED
#define ATOMIC_QUEUE_HPP_INCLUDED
#include <cstddef>
#include <memory>
#include <atomic>
#include <thread>
// At the time of writing, MSVC didn't know noexcept
#if defined(_MSC_VER) && _MSC_VER <= 1700
# define noexcept throw()
#endif
namespace aq {
namespace detail {
/** A node in the linked list. */
template <typename T>
struct node
{
T t /**< Value */;
std::atomic<node<T>*> next /**< Next node in list */;
};
template <typename T, typename Allocator = std::allocator<T> >
class atomic_queue_base
{
public:
atomic_queue_base(const Allocator& alc = Allocator()) noexcept
: size_(0u), front_(nullptr), back_(nullptr),
alc_(alc)
{ }
~atomic_queue_base() noexcept
{
node<T>* fr = front_;
while(fr)
{
node<T>* next = fr->next;
NodeAllocatorTraits::destroy(alc_, fr);
NodeAllocatorTraits::deallocate(alc_, fr, 1);
fr = next;
}
}
void push_back(const T& t)
{
auto new_node = NodeAllocatorTraits::allocate(alc_, 1);
try {
ValueAllocator alc(alc_);
ValueAllocatorTraits::construct(
alc,
&new_node->t, t
);
} catch(...)
{
NodeAllocatorTraits::deallocate(alc_, new_node, 1);
throw;
}
new_node->next = nullptr;
push_node(new_node);
}
void push_back(T&& t)
{
auto new_node = NodeAllocatorTraits::allocate(alc_, 1);
try {
ValueAllocator alc(alc_);
ValueAllocatorTraits::construct(
alc, &new_node->t, std::move(t)
);
} catch(...)
{
NodeAllocatorTraits::deallocate(alc_, new_node, 1);
throw;
}
new_node->next = nullptr;
push_node(new_node);
}
#if !(defined(_MSC_VER) && _MSC_VER <= 1700)
template<typename... Args>
void emplace_back(Args&&... args)
{
auto new_node = NodeAllocatorTraits::allocate(alc_, 1);
try {
ValueAllocator alc(alc_);
ValueAllocatorTraits::construct(
alc, &new_node->t, std::forward<Args>(args)...
);
} catch(...)
{
NodeAllocatorTraits::deallocate(alc_, new_node, 1);
throw;
}
new_node->next = nullptr;
push_node(new_node);
}
#endif
T* pop_front() noexcept
{
node<T>* old_front = front_;
node<T>* new_front;
do {
if (!old_front) return nullptr; // nothing to pop
new_front = old_front->next;
} while(!front_.compare_exchange_weak(old_front, new_front));
--size_;
// if the old front was also the back, the queue is now empty.
new_front = old_front;
if(back_.compare_exchange_strong(new_front, nullptr))
old_front->next = old_front;
return reinterpret_cast<T*>(old_front);
}
void deallocate(T* obj) noexcept
{
if (!obj) return;
// call destructor
NodeAllocatorTraits::destroy(alc_, reinterpret_cast<node<T>*>(obj));
// nodes with next == 0 are still referenced by an executing
// push_back() function and the next ptr will be modified.
// Since we don't want push_back() to write to deallocated memory, we hang
// in a loop until the node has a non-zero next ptr.
while(!reinterpret_cast<node<T>*>(obj)->next.load())
std::this_thread::yield();
NodeAllocatorTraits::deallocate(
alc_, reinterpret_cast<node<T>*>(obj), 1
);
}
std::size_t size() const noexcept
{ return size_; }
protected:
void push_node(node<T>* new_node) noexcept
{
node<T>* old_back = back_.exchange(new_node);
node<T>* null_node = nullptr;
// if front_ was set to null (no node yet), we have to update the front_
// pointer.
if(!front_.compare_exchange_strong(null_node, new_node))
// if front_ is not null, then there was a previous node.
// We have to update this nodes next pointer.
old_back->next = new_node;
++size_;
}
typedef Allocator ValueAllocator;
typedef std::allocator_traits<ValueAllocator> ValueAllocatorTraits;
// Rebind allocator traits for ValueAllocator to our own NodeAllocator
typedef
typename ValueAllocatorTraits::template rebind_traits<node<T> >
#if defined(_MSC_VER) && _MSC_VER <= 1700
::other
#endif
NodeAllocatorTraits;
// Get actual allocator type from traits
typedef typename NodeAllocatorTraits::allocator_type NodeAllocator;
std::atomic_size_t size_ /**< Current size of queue. Not reliable. */;
std::atomic<node<T>*> front_ /**< Front of the queue. */;
std::atomic<node<T>*> back_ /**< Back of the queue. */;
NodeAllocator alc_ /**< Allocator for node<T> objects. */;
};
} // namespace detail
using detail::atomic_queue_base;
} // namespace aq
#endif // ifndef ATOMIC_QUEUE_HPP_INCLUDED
-
\$\begingroup\$ Note: By posting code on this site you are making it available under the creatives common license (see the bottom line on the page). \$\endgroup\$Loki Astari– Loki Astari2012年09月20日 17:52:05 +00:00Commented Sep 20, 2012 at 17:52
-
\$\begingroup\$ @LokiAstari: Thanks for the info. The original license is BSD-style, and I don't really care as long as people can use it freely and I'm accredited. \$\endgroup\$fat-lobyte– fat-lobyte2012年09月21日 07:42:54 +00:00Commented Sep 21, 2012 at 7:42
2 Answers 2
Pretty sure this is broken in multi-threaded code:
do
{
// Point A.
if (!old_front) return nullptr;
new_front = old_front->next;
}
while(!front_.compare_exchange_weak(old_front, new_front));
What happens if:
- Thread A. Reaches point A
- Thread A is de-scheduled and is not currently running.
- Thread B. runs through the above code and works
- Thread B. destroyes the node it poped off the front.
- Thread A. Is re-scheduled to run.
- Thread A. Any use of
old_front
is invalid as it points at a node that was deleted byThread B
- Thread A. Even if the node was not destroyed it is no longer the head of the list.
Thus any usage of next is suspect.
-
\$\begingroup\$ Thanks for looking through it! Technically you are right: The old_front node might be destroyed and I could read deallocated memory which is likely to be garbage. BUT: the value read from old_front can be used if and only if front_ did not change (compare_exchange!). If front_ is the same, the old_front node must be alive and valid, if it is not the same, the value obtained is not written. I expect this to work as long as no compiler/processor starts complaining about reading deallocated memory. Do you see another alternative that performs acceptably? \$\endgroup\$fat-lobyte– fat-lobyte2012年09月21日 07:50:46 +00:00Commented Sep 21, 2012 at 7:50
-
\$\begingroup\$ Damn. There is another problem however. Image the following situation: The queue has two nodes: Q1, Q2,
Q1->next
points to Q2. Thread A calls pop, readsold_front->next
and gets suspended. Then Thread B calls pop two times and succeeds. Thread B pushes another node that by chance gets allocated in the same address as Q1. Thread A gets resumed, and sincefront_ == old_front
, setsfront_
to Q2 which is non-existant, and BOOM. :-( Do you see a way around this? \$\endgroup\$fat-lobyte– fat-lobyte2012年09月21日 08:09:49 +00:00Commented Sep 21, 2012 at 8:09 -
1\$\begingroup\$ Seems like I'm not the only one having this problem: en.wikipedia.org/wiki/ABA_problem \$\endgroup\$fat-lobyte– fat-lobyte2012年09月21日 08:27:26 +00:00Commented Sep 21, 2012 at 8:27
-
1\$\begingroup\$ @fat-lobyte: You miss the point. When
A
resumes and doesnew_front = old_front->next;
you have committedundefined behavior
. Does not matter what happens next you just broke your code. \$\endgroup\$Loki Astari– Loki Astari2012年09月21日 16:31:57 +00:00Commented Sep 21, 2012 at 16:31 -
\$\begingroup\$ You are right, this code is broken. I'll have to go back "to the drawing board". Thanks for your help. \$\endgroup\$fat-lobyte– fat-lobyte2012年09月23日 10:39:04 +00:00Commented Sep 23, 2012 at 10:39
I think the design is not very useful (apart from any possible thread safety issues). pop()
just returns a pointer and the user is trusted with eventually de-allocating it. This cannot be exception safe (leaking memory when an exception is thrown and the pointer returned by pop()
lost). This makes the code useless in many situations where you want to use a threadsafe stack (and is exactly what you don't want to do in C++, it smells like C).
Anthony Williams (in "C++ concurrency in action") implements a lock-free stack that returns a std::shared_ptr<T>
in pop()
. I have no experience with that, but implementing std::shared_ptr
to be threadsafe and lock-free cannot be trivial (I don't know how good the implementations provided in the various libstdc++ are).
-
\$\begingroup\$ Hi, thanks for your review! I know that this is a little "raw", but I wanted to give maximum control for minimum requirements. By returning a pointer, the type does not have to be CopyConstructible or MoveConstructible and it does not create the shared_ptr overhead. Returning a pointer and requiring the user to deallocate it itself is the minimum required overhead. I figured that this was fair for a minimal implementation. Note that this class is called "atomic_queue_base" for a reason: a "real" implementation might derive from it and provide additional comforts. \$\endgroup\$fat-lobyte– fat-lobyte2012年09月30日 10:43:06 +00:00Commented Sep 30, 2012 at 10:43
-
\$\begingroup\$ How is this not thread safe? pop() is noexcept, so why can't the user just "watch out" in the client code? There's nothing that prevents the user from putting the deallocate() call into a try block. \$\endgroup\$fat-lobyte– fat-lobyte2012年09月30日 10:43:21 +00:00Commented Sep 30, 2012 at 10:43
-
\$\begingroup\$ @fat-lobyte I didn't say its not thread safe. \$\endgroup\$Walter– Walter2012年09月30日 22:21:19 +00:00Commented Sep 30, 2012 at 22:21