A small caching utility, can you tell if it is thread safe?
public static class CacheService
{
private readonly static object locker = new object();
private static Dictionary<string, Dictionary<string, object>> data = new Dictionary<string, Dictionary<string, object>>();
public static void Insert(string partition, string key, object value)
{
lock (locker)
{
if (!data.ContainsKey(partition))
{
var newVal = new Dictionary<string, object>();
newVal.Add(key, value);
data.Add(partition, newVal);
}
else
{
if (!data[partition].ContainsKey(key))
{
data[partition].Add(key, value);
}
else
{
data[partition][key] = value;
}
}
}
}
public static bool KeyExists(string partition, string key)
{
return data[partition] != null && data[partition][key] != null;
}
public static V Get<V>(string partition, string key)
{
return (V)data[partition][key];
}
}
-
1\$\begingroup\$ Why not just use a ConcurrentDictionary msdn.microsoft.com/en-us/library/dd287191(v=vs.110).aspx \$\endgroup\$paparazzo– paparazzo2017年05月18日 16:21:08 +00:00Commented May 18, 2017 at 16:21
1 Answer 1
No, it's not safe because you're locking only when writing. You prevent multiple writings to occur simultaneously but:
- Caller might call
KeyExists()
in the middle ofInsert()
, in this case:data[partition][key]
might be notnull
but still empty.data
ordata[partition]
might be in an intermediate invalid internal state.
- If you add a
Delete()
function then reading may go through the same indeterminate states. - When calling
Get()
you can go through the same indeterminate states as above, plus if you addDelete()
then you might be in the middle of that (alsoDelete()
might be called afterKeyExists()
and beforeGet()
).
That said let's see few other improvments you can make:
Calling ContainsKey()
and then searching the value in the dictionary is pretty inefficient because search is done twice. There is a TryGetValue()
method for this:
When you have an else
section you may want to keep the if
condition positive, negation is a small !
symbol which may be unnoticed.
with dictionary[key] = value
you do not need to call ContainsKey()
, that check is already done efficiently inside the setter.
In code (C#7, if earlier version you can't use out var
and sub
has to be declared before):
public static void Insert(string partition, string key, object value)
{
lock (locker)
{
if (data.TryGetValue(partition, out var sub))
{
sub[key] = value;
}
else
{
sub = new Dictionary<string, object>();
sub.Add(key, value);
data.Add(partition, sub);
}
}
}
Now add locking also in the functions to read from the dictionary and you're almost done.
You do not validate parameters. I don't know if for your usage scenarios a null value (or an empty string) for partition
, key
and values
are allowed or not. If not then you must validate your inputs. Something as simple as:
if (partition == null)
throw new ArgumentNullException(nameof(partition));
if (partition.Length == 0)
throw new ArgumentException("Partition cannot be an empty string", nameof(partition));
What about a string made of spaces? In that case you might change second check to:
if (String.IsNullOrWhiteSpace(partition))
throw new ArgumentException("Partition cannot be...", nameof(partition));
Now you can consider if Monitor
(what's used by lock
) is the most appropriate synchronization mechanism for your cache. If you expect multiple concurrent reads and (after an initial period) very few writes then a ReadWriterLockSlim
may give you much much better performance. Profile.
I left these points to the end but they're actually the first things you should consider.
What about testing? Static methods are a nightmare for testing (if you want to replace them with something else at calling point). You may use a default singleton instance (not the perfect thing to do but better than nothing).
Your class has nothing of a cache. It's a sort of MultiMap
but it has nothing to do with a cache. What I'd do? Design a not thread-safe MultiMultiMap<TKeyOuter, TKeyInner, TValue>
(eventually considering to implement ILookup
) which I might reuse somewhere else (even if the standard MultiMap<TKey, TValue>
is probably much more common).
Now I can design a thread-safe access class around it (call it cache if you want).
Now you should start putting some caching logic inside this class, in this moment you have everything at calling point (because CacheService
has too few responsibilities). What about:
- A method which accepts a
Func<object>
instead of object itself? - A method to clear the cache.
- A method to set an expiration for each item.
- A method to limit the memory size used for caching.
Then you might want to abstract cache storage (memory in this case) from cache interface. What if objects you're constructing are pretty huge and expensive to build? Maybe switching to disk storage is still better than re-build them (for example if they come from network data...). Move interface to a base class (or...an interface) and derive this memory cache from that.
Did you consider to use System.Runtime.Caching.MemoryCache instead?
-
\$\begingroup\$ If the OP's going to
lock
every access, the collection becomes essentially single-threaded and will bottleneck performance in some cases. Kudos to you for suggesting alternative practices, but a bit of example code would go a long way. Still, +1. \$\endgroup\$Tamoghna Chowdhury– Tamoghna Chowdhury2017年05月18日 15:07:51 +00:00Commented May 18, 2017 at 15:07 -
\$\begingroup\$ It might not be, I think much depends on his access pattern. Monitor is a very lightweight synchronization primitive and it performs spin wait, great thing if protected code is very short. Profiling is best thing, IMO. \$\endgroup\$Adriano Repetti– Adriano Repetti2017年05月18日 17:10:01 +00:00Commented May 18, 2017 at 17:10
-
\$\begingroup\$ I don't think you got what I meant. I meant that if every access is synchronized, then there can be no concurrent accesses - implying that the collection can only be accessed by one thread at a time - thus having a possibility of making accesses to this collection a bottleneck in highly concurrent applications. Yes,
Monitor
based implementations have the lowest overhead among many synchronization primitives - but that wasn't the source of the bottleneck I was talking about - the source is the OP's thread-safety strategy. I'd prefer innately immutable collections with shared memory any day. \$\endgroup\$Tamoghna Chowdhury– Tamoghna Chowdhury2017年05月18日 18:22:51 +00:00Commented May 18, 2017 at 18:22 -
\$\begingroup\$ Concrete example:
Copy-On-Write
List pattern, just with contiguous unaltered sections being backed by the same memory between updated versions. Functional languages like Haskell and Erlang have this inbuilt, and I think Chris Okasaki's 'Purely Functional Data Structures' is a must-read on this subject. \$\endgroup\$Tamoghna Chowdhury– Tamoghna Chowdhury2017年05月18日 18:25:19 +00:00Commented May 18, 2017 at 18:25 -
1\$\begingroup\$ I'll try my best to "show you the light" of the FP way, as it were :P. Oh, both the Maps I was talking about are immutable, thus the result of chaining them together is also immutable. Lookups are lock-free. Any mutations just create new maps, and then the runtime can use software transactional memory (read: atomic updates) to see to it that there is little or no thread contention. Now if your runtime has no atomically updated references you're quite out of luck (the JVM does, though, as my primary language Scala internally relies on it). \$\endgroup\$Tamoghna Chowdhury– Tamoghna Chowdhury2017年05月19日 08:04:29 +00:00Commented May 19, 2017 at 8:04