Concurrent Handle Table

Question 1

I'm writing a data structure to solve a problem related to servicing hardware interrupts. I have to translate a 64-bit pointer to a 16-bit handle and back again. Unfortunately the hardware completions aren't guaranteed to be in order, and some conventional solutions like lock-free stacks cause a lot of contention as the workload can be quite bursty.

I've created a concurrent handle class where:

Multiple threads can create handles, only one thread destroys. (MPSC)
Handles can be produced and destroyed "out of order".
Producer (Binding Handles) is lock-free.
Consumer (Reading and destroying handles) is wait-free for use in an IRQ context.
Iterate over active handles to handle stalled requests/debugging.

This is a simplified version of my first attempt (optimisations and error handling removed). The core idea is to probe an array of atomic values until it finds a free slot, then try to claim it. I'm mostly wanting feedback on if the memory barriers are correct.

#include <cstddef>
#include <array>
#include <atomic>
#include <utility>
using Handle = uint16_t;
struct HandleTable {
 HandleTable() { clear(); }
 ~HandleTable() { clear(); }
 // Tries to bind a pointer to a handle
 Handle try_bind(void* ptr) {
 // The virtual address is unique for a request in flight
 std::hash<uintptr_t> hash;
 uintptr_t offset = hash(reinterpret_cast<uintptr_t>(ptr));
 for(size_t i = 0; i < slots.size(); ++i) {
 size_t idx = (offset + i) % slots.size();
 // Is the current slot occupied (should this be acquire?)
 // Producers don't touch this object after submitting
 void* current = slots[idx].load(std::memory_order_relaxed);
 if(current) { continue; }
 // Try to claim the slot
 // Synchronize with "acquire" in getter functions
 if(slots[idx].compare_exchange_strong(current, ptr,
 std::memory_order_release, std::memory_order_relaxed)) {
 return Handle(idx + 1);
 }
 }
 return Handle(0);
 }
 // Destroys a handle (called from IRQ)
 void release(Handle val) {
 // Does this need to be memory_order_release?
 // We don't need to synchronize non-atomic data here
 slots[val + 1].store(nullptr, std::memory_order_relaxed);
 }
 // Gets the pointer referred to by handle
 void* get(Handle val) {
 // Synchronize with "release" in setter
 return slots[val + 1].load(std::memory_order_acquire);
 }
 // Clears the handle table (not thread safe)
 void clear() {
 for(size_t i = 0; i < slots.size(); ++i) {
 if(void* res = slots[i].load(std::memory_order_relaxed)) {
 // Synchronise "release" from setter
 std::atomic_thread_fence(std::memory_order_acquire);
 // Do something with the dangling handle
 }
 }
 // Syncrhonise any changes before clearing the table
 std::atomic_thread_fence(std::memory_order_release);
 for(auto& val : slots) {
 val.store(nullptr, std::memory_order_relaxed);
 }
 }
 std::array<std::atomic<void*>, 1024> slots;
};

Question 2

Consider not doing a `load()` in `try_bind()`

Depending on the platform you are running on, the load() might be a cheap way to check if a slot is occupied before doing the more heavy-weight compare_exchange_strong(), but if it is likely that the slot is not occupied (for example, you almost never have more than a handful of active handles), the load() is actually just adding unnecessary overhead.

Also consider using compare_exchange_weak(); your table will still work correctly even if there are spurious failures trying to do the exchange.

Memory order requirements

You are using std::memory_order_relaxed in many places, and depending on how handles are used, this might be fine. But it might be safer to use std::memory_order_acquire in try_bind() and std::memory_order_release in release(), so that there is no way that if a thread releases and another acquires a handle, there might be a small time window where they could have the same valid handle.

Unnecessary atomics in `clear()`?

You mention that clear() is not thread-safe, but then add fences, is there any reason for that?

Question 3

You're probably right about doing cas instead of test-and-cas. The only way this table will be at high occupancy is if there's back pressure on the hardware completions, which is a rate limiting condition regardless.

Question 4

clear() is not "thread-safe" in the sense that calling while threads are active will cause undefined behaviour. A full implementation will need to observe changes from other threads. The intended use case is to clean up after stalled hardware queues (rare but can happen). The acquire fence should be hoisted out of the loop though.

Question 5

For modifying the CAS operation, I assume acquire/release for success and acquire for failure? Or is relaxed on failure sufficient?

Question 6

In addition to G. Sliepen's review:

std::size_t is assumed to be declared in the global namespace as well as in std. That's an unsafe assumption.

We need to include <stdint.h> to define uint16_t and uintptr_t. That header is deprecated, so I recommend <cstdint>, which defines std::uint16_t and std::uintptr_t instead.

Instead of exposing the internals to everyone, I'd expect to see the slots array have private access. And although the array of (non-copyable) atomics will make the HandleTable non-copyable, I'd prefer to see that made explicit with suitable =delete declarations.

The result of hash() is a std::size_t, which isn't necessarily the same type as std::uintptr_t, so we should probably make offset the same type. It won't change the results, but it eliminates a possible warning.

I thought there might be a standard view to rotate slots by offset, but I couldn't find it. It might not simplify the code anyway, when we need to refer back to the underlying element to give the Handle value to return.

Given that Handle is 1-based (using 0 to act as a sentinel), it's quite reasonable to add 1 to the index before converting to Handle in try_bind(). But it looks wrong to be adding, rather than subtracting, 1 when making the opposite conversion (from Handle to index) in release() and get(). These methods could (should?) gracefully cope with being passed the null Handle, too.

Consider replacing get() with operator[] and/or at() to match std::array interface.

Calling clear() from the constructor is undefined behaviour, as the array elements are default-constructed (therefore not containing a valid object), and a default std::atomic_t is uninitialised. They should all be initialised using std::atomic_init():

 HandleTable() {
 for (auto& i: slots) {
 std::atomic_init(&i, nullptr);
 }
 }

The variable res in clear() is never used.

There's a comment // Do something with the dangling handle that doesn't correspond to any actual code. Is that a to-do that you forgot to implement? If we need to do some clean-up with those pointers, we should probably retain their actual type, and provide a means to callback into a suitable function.

Perhaps something like:

template<typename T = void, void(*Deleter)(T*) = [](T*){}>
class HandleTable
{
 std::array<std::atomic<T*>, 1024> slots = {};
public:
 HandleTable()
 {
 for (auto& slot: slots) {
 std::atomic_init(&slot, nullptr);
 }
 }
 ~HandleTable()
 {
 clear();
 }

 void clear()
 {
 for (auto& slot: slots) {
 if (T* res = slot.exchange(nullptr, std::memory_order_relaxed)) {
 Deleter(res);
 }
 }
 }
};

Obviously one could change the template's default for Deleter - perhaps to std::default_deleter<T> if we expect the entries to be allocated with new.

And the table size really ought to be configurable.

I'm not sure whether users really need clear(), or whether its functionality could just be inlined into the destructor.

Question 7

Neither std::atomic<> nor std::array do value-initialization, so the clear() is necessary, I think?

Question 8

In that case, isn't it broken (using the contents before initialisation)? The constructor really ought to std::atomic_init the array elements instead. Or aggregate-initialise the array, but that would be tedious...

Question 9

I've updated to show what the constructor should be doing instead.

Question 10

I introduced the (handle + 1) bug stripping the code for code review, but I have no excuse for the undefined behaviour in the ctor!

Question 11

clear() should be modified to take in a functor to process each dangling entry. A fleshed out class would probably have HandleTable::at(size_t) -> Optional<T> and HandleTable::operator[](size_t) -> T for the getters to match the usual container requirements. .

G. Sliepen G. Sliepen 68.7k3 gold badges74 silver badges179 bronze badges · Answer 1 · 2022-10-25 12:29:19Z