Thread safe static collection, is this code safe

Question 1

A small caching utility, can you tell if it is thread safe?

public static class CacheService
{
 private readonly static object locker = new object();
 private static Dictionary<string, Dictionary<string, object>> data = new Dictionary<string, Dictionary<string, object>>();
 public static void Insert(string partition, string key, object value)
 {
 lock (locker)
 {
 if (!data.ContainsKey(partition))
 {
 var newVal = new Dictionary<string, object>();
 newVal.Add(key, value);
 data.Add(partition, newVal);
 }
 else
 {
 if (!data[partition].ContainsKey(key))
 {
 data[partition].Add(key, value);
 }
 else
 {
 data[partition][key] = value;
 }
 }
 }
 }
 public static bool KeyExists(string partition, string key)
 {
 return data[partition] != null && data[partition][key] != null;
 }
 public static V Get<V>(string partition, string key)
 {
 return (V)data[partition][key];
 }
}

Question 2

Why not just use a ConcurrentDictionary msdn.microsoft.com/en-us/library/dd287191(v=vs.110).aspx

Question 3

No, it's not safe because you're locking only when writing. You prevent multiple writings to occur simultaneously but:

Caller might call KeyExists() in the middle of Insert(), in this case:
- data[partition][key] might be not null but still empty.
- data or data[partition] might be in an intermediate invalid internal state.
If you add a Delete() function then reading may go through the same indeterminate states.
When calling Get() you can go through the same indeterminate states as above, plus if you add Delete() then you might be in the middle of that (also Delete() might be called after KeyExists() and before Get()).

That said let's see few other improvments you can make:

Calling ContainsKey() and then searching the value in the dictionary is pretty inefficient because search is done twice. There is a TryGetValue() method for this:

When you have an else section you may want to keep the if condition positive, negation is a small ! symbol which may be unnoticed.

with dictionary[key] = value you do not need to call ContainsKey(), that check is already done efficiently inside the setter.

In code (C#7, if earlier version you can't use out var and sub has to be declared before):

public static void Insert(string partition, string key, object value)
{
 lock (locker)
 {
 if (data.TryGetValue(partition, out var sub))
 {
 sub[key] = value;
 }
 else
 {
 sub = new Dictionary<string, object>();
 sub.Add(key, value);
 data.Add(partition, sub);
 }
 }
}

Now add locking also in the functions to read from the dictionary and you're almost done.

You do not validate parameters. I don't know if for your usage scenarios a null value (or an empty string) for partition, key and values are allowed or not. If not then you must validate your inputs. Something as simple as:

if (partition == null)
 throw new ArgumentNullException(nameof(partition));
if (partition.Length == 0)
 throw new ArgumentException("Partition cannot be an empty string", nameof(partition));

What about a string made of spaces? In that case you might change second check to:

if (String.IsNullOrWhiteSpace(partition))
 throw new ArgumentException("Partition cannot be...", nameof(partition));

Now you can consider if Monitor (what's used by lock) is the most appropriate synchronization mechanism for your cache. If you expect multiple concurrent reads and (after an initial period) very few writes then a ReadWriterLockSlim may give you much much better performance. Profile.

I left these points to the end but they're actually the first things you should consider.

What about testing? Static methods are a nightmare for testing (if you want to replace them with something else at calling point). You may use a default singleton instance (not the perfect thing to do but better than nothing).

Your class has nothing of a cache. It's a sort of MultiMap but it has nothing to do with a cache. What I'd do? Design a not thread-safe MultiMultiMap<TKeyOuter, TKeyInner, TValue> (eventually considering to implement ILookup) which I might reuse somewhere else (even if the standard MultiMap<TKey, TValue> is probably much more common).

Now I can design a thread-safe access class around it (call it cache if you want).

Now you should start putting some caching logic inside this class, in this moment you have everything at calling point (because CacheService has too few responsibilities). What about:

A method which accepts a Func<object> instead of object itself?
A method to clear the cache.
A method to set an expiration for each item.
A method to limit the memory size used for caching.

Then you might want to abstract cache storage (memory in this case) from cache interface. What if objects you're constructing are pretty huge and expensive to build? Maybe switching to disk storage is still better than re-build them (for example if they come from network data...). Move interface to a base class (or...an interface) and derive this memory cache from that.

Did you consider to use System.Runtime.Caching.MemoryCache instead?

Question 4

If the OP's going to lock every access, the collection becomes essentially single-threaded and will bottleneck performance in some cases. Kudos to you for suggesting alternative practices, but a bit of example code would go a long way. Still, +1.

Question 5

It might not be, I think much depends on his access pattern. Monitor is a very lightweight synchronization primitive and it performs spin wait, great thing if protected code is very short. Profiling is best thing, IMO.

Question 6

I don't think you got what I meant. I meant that if every access is synchronized, then there can be no concurrent accesses - implying that the collection can only be accessed by one thread at a time - thus having a possibility of making accesses to this collection a bottleneck in highly concurrent applications. Yes, Monitor based implementations have the lowest overhead among many synchronization primitives - but that wasn't the source of the bottleneck I was talking about - the source is the OP's thread-safety strategy. I'd prefer innately immutable collections with shared memory any day.

Question 7

Concrete example: Copy-On-Write List pattern, just with contiguous unaltered sections being backed by the same memory between updated versions. Functional languages like Haskell and Erlang have this inbuilt, and I think Chris Okasaki's 'Purely Functional Data Structures' is a must-read on this subject.

Question 8

I'll try my best to "show you the light" of the FP way, as it were :P. Oh, both the Maps I was talking about are immutable, thus the result of chaining them together is also immutable. Lookups are lock-free. Any mutations just create new maps, and then the runtime can use software transactional memory (read: atomic updates) to see to it that there is little or no thread contention. Now if your runtime has no atomically updated references you're quite out of luck (the JVM does, though, as my primary language Scala internally relies on it).

score 5 · Answer 1 · 2017-05-18 11:09:50Z

No, it's not safe because you're locking only when writing. You prevent multiple writings to occur simultaneously but:

Caller might call KeyExists() in the middle of Insert(), in this case:
- data[partition][key] might be not null but still empty.
- data or data[partition] might be in an intermediate invalid internal state.
If you add a Delete() function then reading may go through the same indeterminate states.
When calling Get() you can go through the same indeterminate states as above, plus if you add Delete() then you might be in the middle of that (also Delete() might be called after KeyExists() and before Get()).

That said let's see few other improvments you can make:

Calling ContainsKey() and then searching the value in the dictionary is pretty inefficient because search is done twice. There is a TryGetValue() method for this:

When you have an else section you may want to keep the if condition positive, negation is a small ! symbol which may be unnoticed.

with dictionary[key] = value you do not need to call ContainsKey(), that check is already done efficiently inside the setter.

In code (C#7, if earlier version you can't use out var and sub has to be declared before):

public static void Insert(string partition, string key, object value)
{
 lock (locker)
 {
 if (data.TryGetValue(partition, out var sub))
 {
 sub[key] = value;
 }
 else
 {
 sub = new Dictionary<string, object>();
 sub.Add(key, value);
 data.Add(partition, sub);
 }
 }
}

Now add locking also in the functions to read from the dictionary and you're almost done.

You do not validate parameters. I don't know if for your usage scenarios a null value (or an empty string) for partition, key and values are allowed or not. If not then you must validate your inputs. Something as simple as:

if (partition == null)
 throw new ArgumentNullException(nameof(partition));
if (partition.Length == 0)
 throw new ArgumentException("Partition cannot be an empty string", nameof(partition));

What about a string made of spaces? In that case you might change second check to:

if (String.IsNullOrWhiteSpace(partition))
 throw new ArgumentException("Partition cannot be...", nameof(partition));

Now you can consider if Monitor (what's used by lock) is the most appropriate synchronization mechanism for your cache. If you expect multiple concurrent reads and (after an initial period) very few writes then a ReadWriterLockSlim may give you much much better performance. Profile.

I left these points to the end but they're actually the first things you should consider.

What about testing? Static methods are a nightmare for testing (if you want to replace them with something else at calling point). You may use a default singleton instance (not the perfect thing to do but better than nothing).

Your class has nothing of a cache. It's a sort of MultiMap but it has nothing to do with a cache. What I'd do? Design a not thread-safe MultiMultiMap<TKeyOuter, TKeyInner, TValue> (eventually considering to implement ILookup) which I might reuse somewhere else (even if the standard MultiMap<TKey, TValue> is probably much more common).

Now I can design a thread-safe access class around it (call it cache if you want).

Now you should start putting some caching logic inside this class, in this moment you have everything at calling point (because CacheService has too few responsibilities). What about:

A method which accepts a Func<object> instead of object itself?
A method to clear the cache.
A method to set an expiration for each item.
A method to limit the memory size used for caching.

Then you might want to abstract cache storage (memory in this case) from cache interface. What if objects you're constructing are pretty huge and expensive to build? Maybe switching to disk storage is still better than re-build them (for example if they come from network data...). Move interface to a base class (or...an interface) and derive this memory cache from that.

Did you consider to use System.Runtime.Caching.MemoryCache instead?

If the OP's going to lock every access, the collection becomes essentially single-threaded and will bottleneck performance in some cases. Kudos to you for suggesting alternative practices, but a bit of example code would go a long way. Still, +1.
It might not be, I think much depends on his access pattern. Monitor is a very lightweight synchronization primitive and it performs spin wait, great thing if protected code is very short. Profiling is best thing, IMO.
I don't think you got what I meant. I meant that if every access is synchronized, then there can be no concurrent accesses - implying that the collection can only be accessed by one thread at a time - thus having a possibility of making accesses to this collection a bottleneck in highly concurrent applications. Yes, Monitor based implementations have the lowest overhead among many synchronization primitives - but that wasn't the source of the bottleneck I was talking about - the source is the OP's thread-safety strategy. I'd prefer innately immutable collections with shared memory any day.
Concrete example: Copy-On-Write List pattern, just with contiguous unaltered sections being backed by the same memory between updated versions. Functional languages like Haskell and Erlang have this inbuilt, and I think Chris Okasaki's 'Purely Functional Data Structures' is a must-read on this subject.
I'll try my best to "show you the light" of the FP way, as it were :P. Oh, both the Maps I was talking about are immutable, thus the result of chaining them together is also immutable. Lookups are lock-free. Any mutations just create new maps, and then the runtime can use software transactional memory (read: atomic updates) to see to it that there is little or no thread contention. Now if your runtime has no atomically updated references you're quite out of luck (the JVM does, though, as my primary language Scala internally relies on it).

Stack Exchange Network

Thread safe static collection, is this code safe

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Thread safe static collection, is this code safe

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions