Delete engine for deeply nested linked structure

Question 1

The background is the question at https://stackoverflow.com/questions/36634394/nested-shared-ptr-destruction-causes-stack-overflow. To summarize, for a data structure that looks like:

// A node in a singly linked list.
struct Node {
 int head;
 std::shared_ptr<Node> tail;
};

If the list grows too long, a stack overflow might occur in the destructor of Node due to recursive destructions of the shared_ptrs. I'm implementing a thread-safe version of the delete engine proposed in https://stackoverflow.com/a/36635668/3234803.

There are three simple classes involved: SpinLock, ConcurrentQueue, and DeleteEngine.

The SpinLock class implements the C++ Lockable interface with a std::atomic_flag:

class SpinLock {
public:
 void lock()
 { while (lock_.test_and_set(std::memory_order_acquire) {} }
 bool try_lock() // Not camel-case to be compatible with C++ Lockable interface
 { return !lock_.test_and_set(std::memory_order_acquire); }
 void unlock()
 { lock_.clear(std::memory_order_release); }
private:
 std::atomic_flag lock_ = ATOMIC_FLAG_INIT;
};

The ConcurrentQueue class implements a multiple-producer-single-consumer queue that supports thread-safe enqueue and tryDequeue:

template<typename T>
class ConcurrentQueue {
public:
 void enqueue(T item)
 {
 std::lock_guard<SpinLock> lk(lock_);
 queue_.emplace_back(std::move(item));
 }
 bool tryDequeue(T &item)
 {
 std::lock_guard<SpinLock> lk(lock_);
 if (queue_.empty()) {
 return false;
 }
 item = std::move(queue_.front());
 queue_.pop_front();
 return true;
 }
private:
 std::deque<T> queue_;
 SpinLock lock_;
};

The DeleteEngine class implements the delete engine proposed in the answer aforementioned:

template<typename T>
class DeleteEngine {
public:
 ~DeleteEngine()
 {
 std::lock_guard<SpinLock> lk(deleting_);
 deleteAll();
 }
 void enqueue(T *p)
 {
 queue_.enqueue(p);
 if (deleting_.try_lock()) {
 std::lock_guard<SpinLock> lk(deleting_, std::adopt_lock);
 deleteAll();
 }
 }
private:
 void deleteAll()
 {
 T *p = nullptr;
 while (queue_.tryDequeue(p)) {
 delete p;
 }
 }
 ConcurrentQueue<T *> queue_;
 SpinLock deleting_;
};

Now in the deleter of a shared_ptr<Node>, we will hand the raw pointer to a DeleteEngine<Node> instead of directly deleteing it. A recursive delete can handle up to about 10000 nodes, while this method can handle an arbitrarily large number of nodes.

The above code is only a prototype, but I'm particularly concerned about the performance of this implementation: (1) Most of the time the Node class would be used in a single-threaded environment; (2) Occasionally it might be used in highly concurrent applications, e.g. a web server that constantly creates and destroys objects.

Question 2

Interesting! I wouldn't worry about contention, the overhead of the locking is going to be eclipsed by the time spent freeing memory. But there is a race, since even if the lock is already taken when you enter DeleteEngine::enqueue, there's no guarantee that queued pointer will be seen by deleteAll before the lock is released (or even before deleteAll returns).

Question 3

@Cameron Can you elaborate further? The goal of the delete engine is to allow multiple threads to enqueue at the same time, but only one thread can delete (until the queue is empty, after which another thread can take over).

Question 4

Right. Say thread A calls DeleteEngine::enqueue and takes the lock, then starts deleting. Thread B now calls enqueue as well. Before it enqueues, though, deleteAll on thread A returns (but the lock is not released yet). Thread B then enqueues and sees the lock is still taken, so it returns. Thread A then releases the lock. The result: There is still an item in the queue that will not be deleted until the next call to deleteAll.

Question 5

This is just one example, there's more subtle ones along the same lines where the element is enqueued by the second thread, but the thread in deleteAll doesn't see it right away (because you're using acquire/release and not sequential consistency), with the same end result. If you want to avoid these kind of races, you'll have to synchronize on a single variable (not the queue and the lock separately), e.g. a counter of the number of things in the queue. It would be simpler (and likely faster) to have one deleter per thread.

Question 6

Oh, and I just realized: Using acquire/release in the lock for the queue is not enough. A thread could enqueue, and release the lock, then another thread could acquire the lock but not yet synchronize with the release (the memory effects aren't yet visible on its CPU core), which would lead to corruption of the queue. This should be reproducible empirically with a high-contention test. EDIT: Ah, wait, I take that back. That can't happen because the atomic flag value itself was necessarily propagated to the thread acquiring the lock, and so it must synchronize with the previous release.

Question 7

To address the locking overhead in the single-threaded case, my first advice is (as with all performance questions) to measure before optimising! It's likely that the locking overhead is tiny compared to the queue's memory management.

If you really find you need to streamline the locking, then you'll probably want to have two versions of the engine. The simplest way to do that is probably to add a boolean parameter to the template, and then use if constexpr at the points where the code differs. Or perhaps make the lock type a template parameter, and supply a no-op lock when creating the single-thread version.

Toby Speight Toby Speight 87.2k14 gold badges104 silver badges322 bronze badges · Answer 1 · 2020-11-06 09:18:02Z

To address the locking overhead in the single-threaded case, my first advice is (as with all performance questions) to measure before optimising! It's likely that the locking overhead is tiny compared to the queue's memory management.

If you really find you need to streamline the locking, then you'll probably want to have two versions of the engine. The simplest way to do that is probably to add a boolean parameter to the template, and then use if constexpr at the points where the code differs. Or perhaps make the lock type a template parameter, and supply a no-op lock when creating the single-thread version.

Stack Exchange Network

Delete engine for deeply nested linked structure

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Delete engine for deeply nested linked structure

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions