I'm writing a data structure to solve a problem related to servicing hardware interrupts. I have to translate a 64-bit pointer to a 16-bit handle and back again. Unfortunately the hardware completions aren't guaranteed to be in order, and some conventional solutions like lock-free stacks cause a lot of contention as the workload can be quite bursty.
I've created a concurrent handle class where:
- Multiple threads can create handles, only one thread destroys. (MPSC)
- Handles can be produced and destroyed "out of order".
- Producer (Binding Handles) is lock-free.
- Consumer (Reading and destroying handles) is wait-free for use in an IRQ context.
- Iterate over active handles to handle stalled requests/debugging.
This is a simplified version of my first attempt (optimisations and error handling removed). The core idea is to probe an array of atomic values until it finds a free slot, then try to claim it. I'm mostly wanting feedback on if the memory barriers are correct.
#include <cstddef>
#include <array>
#include <atomic>
#include <utility>
using Handle = uint16_t;
struct HandleTable {
HandleTable() { clear(); }
~HandleTable() { clear(); }
// Tries to bind a pointer to a handle
Handle try_bind(void* ptr) {
// The virtual address is unique for a request in flight
std::hash<uintptr_t> hash;
uintptr_t offset = hash(reinterpret_cast<uintptr_t>(ptr));
for(size_t i = 0; i < slots.size(); ++i) {
size_t idx = (offset + i) % slots.size();
// Is the current slot occupied (should this be acquire?)
// Producers don't touch this object after submitting
void* current = slots[idx].load(std::memory_order_relaxed);
if(current) { continue; }
// Try to claim the slot
// Synchronize with "acquire" in getter functions
if(slots[idx].compare_exchange_strong(current, ptr,
std::memory_order_release, std::memory_order_relaxed)) {
return Handle(idx + 1);
}
}
return Handle(0);
}
// Destroys a handle (called from IRQ)
void release(Handle val) {
// Does this need to be memory_order_release?
// We don't need to synchronize non-atomic data here
slots[val + 1].store(nullptr, std::memory_order_relaxed);
}
// Gets the pointer referred to by handle
void* get(Handle val) {
// Synchronize with "release" in setter
return slots[val + 1].load(std::memory_order_acquire);
}
// Clears the handle table (not thread safe)
void clear() {
for(size_t i = 0; i < slots.size(); ++i) {
if(void* res = slots[i].load(std::memory_order_relaxed)) {
// Synchronise "release" from setter
std::atomic_thread_fence(std::memory_order_acquire);
// Do something with the dangling handle
}
}
// Syncrhonise any changes before clearing the table
std::atomic_thread_fence(std::memory_order_release);
for(auto& val : slots) {
val.store(nullptr, std::memory_order_relaxed);
}
}
std::array<std::atomic<void*>, 1024> slots;
};
2 Answers 2
Consider not doing a load()
in try_bind()
Depending on the platform you are running on, the load()
might be a cheap way to check if a slot is occupied before doing the more heavy-weight compare_exchange_strong()
, but if it is likely that the slot is not occupied (for example, you almost never have more than a handful of active handles), the load()
is actually just adding unnecessary overhead.
Also consider using compare_exchange_weak()
; your table will still work correctly even if there are spurious failures trying to do the exchange.
Memory order requirements
You are using std::memory_order_relaxed
in many places, and depending on how handles are used, this might be fine. But it might be safer to use std::memory_order_acquire
in try_bind()
and std::memory_order_release
in release()
, so that there is no way that if a thread releases and another acquires a handle, there might be a small time window where they could have the same valid handle.
Unnecessary atomics in clear()
?
You mention that clear()
is not thread-safe, but then add fences, is there any reason for that?
-
\$\begingroup\$ You're probably right about doing cas instead of test-and-cas. The only way this table will be at high occupancy is if there's back pressure on the hardware completions, which is a rate limiting condition regardless. \$\endgroup\$BlamKiwi– BlamKiwi2022年10月25日 21:09:01 +00:00Commented Oct 25, 2022 at 21:09
-
\$\begingroup\$ clear() is not "thread-safe" in the sense that calling while threads are active will cause undefined behaviour. A full implementation will need to observe changes from other threads. The intended use case is to clean up after stalled hardware queues (rare but can happen). The acquire fence should be hoisted out of the loop though. \$\endgroup\$BlamKiwi– BlamKiwi2022年10月25日 21:22:04 +00:00Commented Oct 25, 2022 at 21:22
-
\$\begingroup\$ For modifying the CAS operation, I assume acquire/release for success and acquire for failure? Or is relaxed on failure sufficient? \$\endgroup\$BlamKiwi– BlamKiwi2022年10月25日 22:18:40 +00:00Commented Oct 25, 2022 at 22:18
In addition to G. Sliepen's review:
std::size_t
is assumed to be declared in the global namespace as well as in std
. That's an unsafe assumption.
We need to include <stdint.h>
to define uint16_t
and uintptr_t
. That header is deprecated, so I recommend <cstdint>
, which defines std::uint16_t
and std::uintptr_t
instead.
Instead of exposing the internals to everyone, I'd expect to see the slots
array have private
access. And although the array of (non-copyable) atomics will make the HandleTable
non-copyable, I'd prefer to see that made explicit with suitable =delete
declarations.
The result of hash()
is a std::size_t
, which isn't necessarily the same type as std::uintptr_t
, so we should probably make offset
the same type. It won't change the results, but it eliminates a possible warning.
I thought there might be a standard view to rotate slots
by offset
, but I couldn't find it. It might not simplify the code anyway, when we need to refer back to the underlying element to give the Handle
value to return.
Given that Handle
is 1-based (using 0 to act as a sentinel), it's quite reasonable to add 1 to the index before converting to Handle
in try_bind()
. But it looks wrong to be adding, rather than subtracting, 1 when making the opposite conversion (from Handle
to index) in release()
and get()
. These methods could (should?) gracefully cope with being passed the null Handle
, too.
Consider replacing get()
with operator[]
and/or at()
to match std::array
interface.
Calling clear()
from the constructor is undefined behaviour, as the array elements are default-constructed (therefore not containing a valid object), and a default std::atomic_t
is uninitialised. They should all be initialised using std::atomic_init()
:
HandleTable() {
for (auto& i: slots) {
std::atomic_init(&i, nullptr);
}
}
The variable res
in clear()
is never used.
There's a comment // Do something with the dangling handle
that doesn't correspond to any actual code. Is that a to-do that you forgot to implement? If we need to do some clean-up with those pointers, we should probably retain their actual type, and provide a means to callback into a suitable function.
Perhaps something like:
template<typename T = void, void(*Deleter)(T*) = [](T*){}>
class HandleTable
{
std::array<std::atomic<T*>, 1024> slots = {};
public:
HandleTable()
{
for (auto& slot: slots) {
std::atomic_init(&slot, nullptr);
}
}
~HandleTable()
{
clear();
}
void clear()
{
for (auto& slot: slots) {
if (T* res = slot.exchange(nullptr, std::memory_order_relaxed)) {
Deleter(res);
}
}
}
};
Obviously one could change the template's default for Deleter
- perhaps to std::default_deleter<T>
if we expect the entries to be allocated with new
.
And the table size really ought to be configurable.
I'm not sure whether users really need clear()
, or whether its functionality could just be inlined into the destructor.
-
\$\begingroup\$ Neither
std::atomic<>
norstd::array
do value-initialization, so theclear()
is necessary, I think? \$\endgroup\$G. Sliepen– G. Sliepen2022年10月25日 15:04:15 +00:00Commented Oct 25, 2022 at 15:04 -
1\$\begingroup\$ In that case, isn't it broken (using the contents before initialisation)? The constructor really ought to
std::atomic_init
the array elements instead. Or aggregate-initialise the array, but that would be tedious... \$\endgroup\$Toby Speight– Toby Speight2022年10月25日 16:38:16 +00:00Commented Oct 25, 2022 at 16:38 -
\$\begingroup\$ I've updated to show what the constructor should be doing instead. \$\endgroup\$Toby Speight– Toby Speight2022年10月25日 16:51:22 +00:00Commented Oct 25, 2022 at 16:51
-
\$\begingroup\$ I introduced the (handle + 1) bug stripping the code for code review, but I have no excuse for the undefined behaviour in the ctor! \$\endgroup\$BlamKiwi– BlamKiwi2022年10月25日 21:26:00 +00:00Commented Oct 25, 2022 at 21:26
-
\$\begingroup\$ clear() should be modified to take in a functor to process each dangling entry. A fleshed out class would probably have HandleTable::at(size_t) -> Optional<T> and HandleTable::operator[](size_t) -> T for the getters to match the usual container requirements. . \$\endgroup\$BlamKiwi– BlamKiwi2022年10月25日 21:39:39 +00:00Commented Oct 25, 2022 at 21:39