Specialized ConcurrentLinkedList

Question 1

I need a concurrent data structure with the following properties:

Low overhead for enumeration
Insert only at end.
Removal from any point.
Enumeration should be safe with concurrent writing
Enumeration does not require the list to be in a constant state
Not a problem if I enumerate over removed items

I have designed the following linked list class which should meet my requirements.

using System.Collections;
using System.Collections.Generic;
internal class ConcurrentLinkedList<T> : IEnumerable<Node<T>> {
 private readonly BaseNode<T> _root;
 private BaseNode<T> _end;
 public ConcurrentLinkedList() {
 _end = _root = new BaseNode<T>();
 }
 public Node<T> Insert(T value) {
 var node = new Node<T>(value);
 lock (node) {
 lock (_end) {
 node.Previous = _end;
 _end = _end.Next = node;
 return node;
 } 
 }
 }
 public void Remove(Node<T> node) {
 lock (node.Previous) {
 lock (node) {
 if (node.Next == null) {
 if (node.Previous.Next != node) return;
 node.Previous.Next = null;
 if (_end == node) _end = node.Previous;
 } else {
 lock (node.Next) {
 if (node.Previous.Next == node) node.Previous.Next = node.Next;
 if (node.Next.Previous == node) node.Next.Previous = node.Previous;
 } 
 }
 } 
 }
 }
 public IEnumerator<Node<T>> GetEnumerator() {
 for (var current = _root.Next; current != null; current = current.Next) {
 yield return current;
 }
 }
 IEnumerator IEnumerable.GetEnumerator() {
 return GetEnumerator();
 }
}
internal class BaseNode<T> {
 internal volatile Node<T> Next;
}
internal class Node<T> : BaseNode<T> {
 internal BaseNode<T> Previous;
 public T Value { get; }
 public Node(T value) {
 Value = value;
 }
}

I am aware the calls is vary easy to misuse but as it is only for internal use i am not concerned about this. I am considering making the all the class private subclass to ensure that there is no accidental misuse.

In the code all the writing is in a lock but there are shared volatile which are read out of lock. Apart from the possibility of enumerating deleted nodes, is there any other danger with this?

For the locking, I lock no all the node being changed. This should allow other parts of the list to be concurrency modified, the list can get long. Is this sufficient?

To avoid deadlock I always lock from the lowest node to the last.

Is the provided locking sufficient to protect the _end field? I believe that according to the logic used the current _end will always be locked when changing the value of _end. Is this sufficient?

Is this code trying to be too clever and would it better to either:

have one write lock and continue using the volatile for reading.
use a standard ReadWriteLock for the whole list.

Question 2

Initially _end points to the same object as _root. _end.Next is assigned to.

Question 3

Ahh, mistook it for head+tail, not that version, readonly _root is the only internal-persistent node, _end is just a pointer to last node. My apologies.

Question 4

There is currently similar idea for review implemented in C++ (a bit different, having both head+tail nodes, but with same hand-over-hand locking as Quuxplusone named it).

Question 5

EDIT: Possible race condition

I would not accept this code to production without proper documentation containing good proof, that it works as expected. lock (node.Previous) could get interrupted by another Remove changing node.Previous after it was passed to Monitor.Enter which smells too much (it will lock removed node instead of the one it should). Changing the order of locks may introduce dead-lock (starvation).

Let us start by Insert(1); Insert(2); Insert(3) and then call Remove from three different threads. 3rd thread will call lock (node.Previous) which contains two parts - first fetching node.Previous and then calling Monitor.Enter. Let us suspend this Remove(3) in between - fetching Node(2), but not yet calling Monitor.Enter (or suspend inside, but before it can do anything). Let us now finish Remove(2), which will change Node(3).Previous = Node(1). Let us continue 3rd thread, which will lock removed Node(2) (instead of Node(1)) and Node(3) reaching line node.Previous.Next = null. It won't crash here because lock introduces memory barrier (thus node.Previous will trully read Node(1), not the locked Node(2)). Let us suspend 3rd thread here and start 1st thread calling Remove(1), locking _root, Node(1) passing if (node.Next == null) going to else lock (node.Next). Let us now suspend this thread and continue in 3rd thread that was about to do node.Previous.Next = null. Compiler will probably cache node.Next in 1st thread, but what if it does not? lock (null). We could rewrite it (var next = node.Next; if (next == null) { ... } else lock (next)), now 1st thread will block until 3rd finishes, but is still left with node.Next = null possibly crashing in if (node.Next.Prevous == node) and even if we change that to if (next.Previous == node) next.Previous = ... it changes state of removed node which you could try to remove again... and it is all looking like not worth the effort.

Single ReaderWriteLockSlim is safe to use and the performance will still probably be good enough. Using lock only for writing (Insert, Remove) and leaving GetEnumerator without any locking may still be possible - read my original answer (3rd point).

Original answer

I will leave my original answer here as there may still be some valuable information, but I do not trust the locking now. It won't dead-lock and although I cannot proove race-condition corrupting the state (not one that could not be fixed), I myself will never use such collection, unless somebody can describe unbreakable proof that it is safe.

My reasoning why the locking is safe

All the locks are taken in same order (from root to end, forward direction), which means removal of newer nodes can block threads that are currently trying to remove older node, but there is no loop for possible starvation - the thread removing newer node will block the other temporarily, do its job and then the other can continue. There is small incosistency in Insert where you first lock the node that will become the last and then lock the last node (which is reversed order of locking), but I believe you don't even have to lock the inserted node, because it won't be visible until _end = _end.Next = node;
You always lock _end before changing it (the node _end points to). You also hold the lock of the node that will become _end after the operation (_end.Previous if removing, not necessary in Insert because nobody can see the node until you publish it by _end.Next = node and only GetEnumerator() can see it, Remove would need to lock _end first, which it cannot, until you finish inserting).
GetEnumerator does not need any lock, because removed nodes retain their Next (which is internal - some private nested interface could maybe protect it even better a bit like friend in C++, but internal appears to be sufficient for your use-case). That means that even if you remove the node that is current in GetEnumerator, you can still safely iterate. Nobody can reinsert the node or modify Next. current.Next is always node that at least existed when you started iterating (as long as pointer assignment is atomic).

Lines I deem unnecessary / redundant

lock (node) inside Insert - already described, lock (_end) is enough (and the order should be lock (_end) lock (node) to preserve the order of locking - forward).
if (node.Previous.Next != node) return; inside Remove. How could that happen? Not in normal usage, but if we consider that you could possibly call Remove multiple times for the same node. In that case, I would mark removed nodes by Previous = null and to make Remove not corrupt the state if you pass node from different list, I would change _root + _end to single doubly-linked node (add Previous to BaseNode) and make the list hybrid - null-terminated in forward direction and root-terminated in backwards direction (cyclic, starting with _root.Previous = _root). That can be implemented in such a way, that Remove could be static (not that it should, but could - does not need to access _root, all it needs is the node). That means that Remove could safely remove nodes from any list.

Style / design / naming

Node is too generic for global space (and you should use namespace). I would make it public nested (I hope you can still make the list implement IEnumerable<Node>), have Next and Previous private, add private interface to Node with Insert and Remove that will be called by the list (and Insert only as Insert => _root.Insert(value) and Remove => ((INode)node).Remove(), while having INode _root as well as Next and Previous) to protect everybody including yourself from modifying Next/Previous in unacceptable way.

P.S.: Having Next typed as Node (as you currently have) may be better for performance (needing ((INode)_root).Insert(value) which is safe cast, compiler knows it is valid). The originally proposed version (having INode-typed Next) would need casting in GetEnumerator that would have to be runtime-checked.

Question 6

P.P.S.: Document your code, describe what it does and how it does it. There are many doubly-linked lists (null-terminated, cyclic and hybrid and all can have zero, one or two embedded nodes, which together means 9 possible implementations, although not all combinations are used, still a lot of possible versions).

Question 7

I have very bad feeling about lock (node.Previous) (concurrent Remove could change it), but I also think, that this lock is actually redundant and the lock (node) if (node.Next != null) lock (node.Next) saves the code. Alternative could be lock (node) lock (node.Previous), which should also work. I will give it some more time, this is not trivial (so, the answer could as well be: don't try to be too clever, use safer but surely secure locking).

user52292user52292 · Answer 1 · 2018-11-15 08:24:52Z

EDIT: Possible race condition

I would not accept this code to production without proper documentation containing good proof, that it works as expected. lock (node.Previous) could get interrupted by another Remove changing node.Previous after it was passed to Monitor.Enter which smells too much (it will lock removed node instead of the one it should). Changing the order of locks may introduce dead-lock (starvation).

Let us start by Insert(1); Insert(2); Insert(3) and then call Remove from three different threads. 3rd thread will call lock (node.Previous) which contains two parts - first fetching node.Previous and then calling Monitor.Enter. Let us suspend this Remove(3) in between - fetching Node(2), but not yet calling Monitor.Enter (or suspend inside, but before it can do anything). Let us now finish Remove(2), which will change Node(3).Previous = Node(1). Let us continue 3rd thread, which will lock removed Node(2) (instead of Node(1)) and Node(3) reaching line node.Previous.Next = null. It won't crash here because lock introduces memory barrier (thus node.Previous will trully read Node(1), not the locked Node(2)). Let us suspend 3rd thread here and start 1st thread calling Remove(1), locking _root, Node(1) passing if (node.Next == null) going to else lock (node.Next). Let us now suspend this thread and continue in 3rd thread that was about to do node.Previous.Next = null. Compiler will probably cache node.Next in 1st thread, but what if it does not? lock (null). We could rewrite it (var next = node.Next; if (next == null) { ... } else lock (next)), now 1st thread will block until 3rd finishes, but is still left with node.Next = null possibly crashing in if (node.Next.Prevous == node) and even if we change that to if (next.Previous == node) next.Previous = ... it changes state of removed node which you could try to remove again... and it is all looking like not worth the effort.

Single ReaderWriteLockSlim is safe to use and the performance will still probably be good enough. Using lock only for writing (Insert, Remove) and leaving GetEnumerator without any locking may still be possible - read my original answer (3rd point).

Original answer

I will leave my original answer here as there may still be some valuable information, but I do not trust the locking now. It won't dead-lock and although I cannot proove race-condition corrupting the state (not one that could not be fixed), I myself will never use such collection, unless somebody can describe unbreakable proof that it is safe.

My reasoning why the locking is safe

All the locks are taken in same order (from root to end, forward direction), which means removal of newer nodes can block threads that are currently trying to remove older node, but there is no loop for possible starvation - the thread removing newer node will block the other temporarily, do its job and then the other can continue. There is small incosistency in Insert where you first lock the node that will become the last and then lock the last node (which is reversed order of locking), but I believe you don't even have to lock the inserted node, because it won't be visible until _end = _end.Next = node;
You always lock _end before changing it (the node _end points to). You also hold the lock of the node that will become _end after the operation (_end.Previous if removing, not necessary in Insert because nobody can see the node until you publish it by _end.Next = node and only GetEnumerator() can see it, Remove would need to lock _end first, which it cannot, until you finish inserting).
GetEnumerator does not need any lock, because removed nodes retain their Next (which is internal - some private nested interface could maybe protect it even better a bit like friend in C++, but internal appears to be sufficient for your use-case). That means that even if you remove the node that is current in GetEnumerator, you can still safely iterate. Nobody can reinsert the node or modify Next. current.Next is always node that at least existed when you started iterating (as long as pointer assignment is atomic).

Lines I deem unnecessary / redundant

lock (node) inside Insert - already described, lock (_end) is enough (and the order should be lock (_end) lock (node) to preserve the order of locking - forward).
if (node.Previous.Next != node) return; inside Remove. How could that happen? Not in normal usage, but if we consider that you could possibly call Remove multiple times for the same node. In that case, I would mark removed nodes by Previous = null and to make Remove not corrupt the state if you pass node from different list, I would change _root + _end to single doubly-linked node (add Previous to BaseNode) and make the list hybrid - null-terminated in forward direction and root-terminated in backwards direction (cyclic, starting with _root.Previous = _root). That can be implemented in such a way, that Remove could be static (not that it should, but could - does not need to access _root, all it needs is the node). That means that Remove could safely remove nodes from any list.

Style / design / naming

Node is too generic for global space (and you should use namespace). I would make it public nested (I hope you can still make the list implement IEnumerable<Node>), have Next and Previous private, add private interface to Node with Insert and Remove that will be called by the list (and Insert only as Insert => _root.Insert(value) and Remove => ((INode)node).Remove(), while having INode _root as well as Next and Previous) to protect everybody including yourself from modifying Next/Previous in unacceptable way.

P.S.: Having Next typed as Node (as you currently have) may be better for performance (needing ((INode)_root).Insert(value) which is safe cast, compiler knows it is valid). The originally proposed version (having INode-typed Next) would need casting in GetEnumerator that would have to be runtime-checked.

P.P.S.: Document your code, describe what it does and how it does it. There are many doubly-linked lists (null-terminated, cyclic and hybrid and all can have zero, one or two embedded nodes, which together means 9 possible implementations, although not all combinations are used, still a lot of possible versions).
I have very bad feeling about lock (node.Previous) (concurrent Remove could change it), but I also think, that this lock is actually redundant and the lock (node) if (node.Next != null) lock (node.Next) saves the code. Alternative could be lock (node) lock (node.Previous), which should also work. I will give it some more time, this is not trivial (so, the answer could as well be: don't try to be too clever, use safer but surely secure locking).

Stack Exchange Network

Specialized ConcurrentLinkedList

1 Answer 1

EDIT: Possible race condition

Original answer

My reasoning why the locking is safe

Lines I deem unnecessary / redundant

Style / design / naming

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Specialized ConcurrentLinkedList

1 Answer 1

EDIT: Possible race condition

Original answer

My reasoning why the locking is safe

Lines I deem unnecessary / redundant

Style / design / naming

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions