I need a concurrent data structure with the following properties:
- Low overhead for enumeration
- Insert only at end.
- Removal from any point.
- Enumeration should be safe with concurrent writing
- Enumeration does not require the list to be in a constant state
- Not a problem if I enumerate over removed items
I have designed the following linked list class which should meet my requirements.
using System.Collections;
using System.Collections.Generic;
internal class ConcurrentLinkedList<T> : IEnumerable<Node<T>> {
private readonly BaseNode<T> _root;
private BaseNode<T> _end;
public ConcurrentLinkedList() {
_end = _root = new BaseNode<T>();
}
public Node<T> Insert(T value) {
var node = new Node<T>(value);
lock (node) {
lock (_end) {
node.Previous = _end;
_end = _end.Next = node;
return node;
}
}
}
public void Remove(Node<T> node) {
lock (node.Previous) {
lock (node) {
if (node.Next == null) {
if (node.Previous.Next != node) return;
node.Previous.Next = null;
if (_end == node) _end = node.Previous;
} else {
lock (node.Next) {
if (node.Previous.Next == node) node.Previous.Next = node.Next;
if (node.Next.Previous == node) node.Next.Previous = node.Previous;
}
}
}
}
}
public IEnumerator<Node<T>> GetEnumerator() {
for (var current = _root.Next; current != null; current = current.Next) {
yield return current;
}
}
IEnumerator IEnumerable.GetEnumerator() {
return GetEnumerator();
}
}
internal class BaseNode<T> {
internal volatile Node<T> Next;
}
internal class Node<T> : BaseNode<T> {
internal BaseNode<T> Previous;
public T Value { get; }
public Node(T value) {
Value = value;
}
}
I am aware the calls is vary easy to misuse but as it is only for internal use i am not concerned about this. I am considering making the all the class private subclass to ensure that there is no accidental misuse.
In the code all the writing is in a lock but there are shared volatile which are read out of lock. Apart from the possibility of enumerating deleted nodes, is there any other danger with this?
For the locking, I lock no all the node being changed. This should allow other parts of the list to be concurrency modified, the list can get long. Is this sufficient?
To avoid deadlock I always lock from the lowest node to the last.
Is the provided locking sufficient to protect the _end
field? I believe that according to the logic used the current _end
will always be locked when changing the value of _end
. Is this sufficient?
Is this code trying to be too clever and would it better to either:
- have one write lock and continue using the volatile for reading.
- use a standard
ReadWriteLock
for the whole list.
1 Answer 1
EDIT: Possible race condition
I would not accept this code to production without proper documentation containing good proof, that it works as expected. lock (node.Previous)
could get interrupted by another Remove
changing node.Previous
after it was passed to Monitor.Enter
which smells too much (it will lock removed node instead of the one it should). Changing the order of locks may introduce dead-lock (starvation).
Let us start by Insert(1); Insert(2); Insert(3)
and then call Remove
from three different threads. 3rd thread will call lock (node.Previous)
which contains two parts - first fetching node.Previous
and then calling Monitor.Enter
. Let us suspend this Remove(3)
in between - fetching Node(2)
, but not yet calling Monitor.Enter
(or suspend inside, but before it can do anything). Let us now finish Remove(2)
, which will change Node(3).Previous = Node(1)
. Let us continue 3rd thread, which will lock removed Node(2)
(instead of Node(1)
) and Node(3)
reaching line node.Previous.Next = null
. It won't crash here because lock
introduces memory barrier (thus node.Previous
will trully read Node(1)
, not the locked Node(2)
). Let us suspend 3rd thread here and start 1st thread calling Remove(1)
, locking _root
, Node(1)
passing if (node.Next == null)
going to else lock (node.Next)
. Let us now suspend this thread and continue in 3rd thread that was about to do node.Previous.Next = null
. Compiler will probably cache node.Next
in 1st thread, but what if it does not? lock (null)
. We could rewrite it (var next = node.Next; if (next == null) { ... } else lock (next)
), now 1st thread will block until 3rd finishes, but is still left with node.Next = null
possibly crashing in if (node.Next.Prevous == node)
and even if we change that to if (next.Previous == node) next.Previous = ...
it changes state of removed node which you could try to remove again... and it is all looking like not worth the effort.
Single ReaderWriteLockSlim
is safe to use and the performance will still probably be good enough. Using lock only for writing (Insert
, Remove
) and leaving GetEnumerator
without any locking may still be possible - read my original answer (3rd point).
Original answer
I will leave my original answer here as there may still be some valuable information, but I do not trust the locking now. It won't dead-lock and although I cannot proove race-condition corrupting the state (not one that could not be fixed), I myself will never use such collection, unless somebody can describe unbreakable proof that it is safe.
My reasoning why the locking is safe
- All the locks are taken in same order (from root to end, forward direction), which means removal of newer nodes can block threads that are currently trying to remove older node, but there is no loop for possible starvation - the thread removing newer node will block the other temporarily, do its job and then the other can continue. There is small incosistency in
Insert
where you first lock the node that will become the last and then lock the last node (which is reversed order of locking), but I believe you don't even have to lock the inserted node, because it won't be visible until_end = _end.Next = node;
- You always lock
_end
before changing it (the node_end
points to). You also hold the lock of the node that will become_end
after the operation (_end.Previous
if removing, not necessary inInsert
because nobody can see the node until you publish it by_end.Next = node
and onlyGetEnumerator()
can see it,Remove
would need to lock_end
first, which it cannot, until you finish inserting). GetEnumerator
does not need any lock, because removed nodes retain theirNext
(which isinternal
- someprivate
nestedinterface
could maybe protect it even better a bit likefriend
in C++, butinternal
appears to be sufficient for your use-case). That means that even if you remove the node that iscurrent
inGetEnumerator
, you can still safely iterate. Nobody can reinsert the node or modifyNext
.current.Next
is always node that at least existed when you started iterating (as long as pointer assignment is atomic).
Lines I deem unnecessary / redundant
lock (node)
insideInsert
- already described,lock (_end)
is enough (and the order should belock (_end) lock (node)
to preserve the order of locking - forward).if (node.Previous.Next != node) return;
insideRemove
. How could that happen? Not in normal usage, but if we consider that you could possibly callRemove
multiple times for the same node. In that case, I would mark removed nodes byPrevious = null
and to makeRemove
not corrupt the state if you pass node from different list, I would change_root + _end
to single doubly-linked node (addPrevious
toBaseNode
) and make the list hybrid - null-terminated in forward direction and root-terminated in backwards direction (cyclic, starting with_root.Previous = _root
). That can be implemented in such a way, thatRemove
could bestatic
(not that it should, but could - does not need to access_root
, all it needs is thenode
). That means thatRemove
could safely remove nodes from any list.
Style / design / naming
Node
is too generic for global space (and you should use namespace
). I would make it public
nested (I hope you can still make the list implement IEnumerable<Node>
), have Next
and Previous private
, add private interface
to Node
with Insert
and Remove
that will be called by the list (and Insert
only as Insert => _root.Insert(value)
and Remove => ((INode)node).Remove()
, while having INode _root
as well as Next
and Previous
) to protect everybody including yourself from modifying Next
/Previous
in unacceptable way.
P.S.: Having Next
typed as Node
(as you currently have) may be better for performance (needing ((INode)_root).Insert(value)
which is safe cast, compiler knows it is valid). The originally proposed version (having INode
-typed Next
) would need casting in GetEnumerator
that would have to be runtime-checked.
-
\$\begingroup\$ P.P.S.: Document your code, describe what it does and how it does it. There are many doubly-linked lists (null-terminated, cyclic and hybrid and all can have zero, one or two embedded nodes, which together means 9 possible implementations, although not all combinations are used, still a lot of possible versions). \$\endgroup\$user52292– user522922018年11月15日 12:37:28 +00:00Commented Nov 15, 2018 at 12:37
-
\$\begingroup\$ I have very bad feeling about
lock (node.Previous)
(concurrentRemove
could change it), but I also think, that this lock is actually redundant and thelock (node) if (node.Next != null) lock (node.Next)
saves the code. Alternative could belock (node) lock (node.Previous)
, which should also work. I will give it some more time, this is not trivial (so, the answer could as well be: don't try to be too clever, use safer but surely secure locking). \$\endgroup\$user52292– user522922018年11月15日 17:26:46 +00:00Commented Nov 15, 2018 at 17:26
_end
points to the same object as_root
._end.Next
is assigned to. \$\endgroup\$readonly _root
is the only internal-persistent node,_end
is just a pointer to last node. My apologies. \$\endgroup\$