I have been attempting to implement a cache which caches the results of an asynchronous method, with the restriction that I only want that method to run once for any particular item in the cache. Many other implementations I have seen allow the method to run multiple times. Although only the first returned value is actually stored, if this is a long running method, this is not desirable for me.
Storing Task<T>
(instead of T
) in the cache, as recommended by most implementers, is a good idea, but I still could not work out how to prevent the method that fetches the value (the valueFactory) from running more than once for any given item. The solution I came up with was to use TaskCompletionSource instead of Task for the items in the cache.
This is what I came up with:
public class AsyncCache<TKey, TValue>
{
private readonly Func<TKey, Task<TValue>> valueFactory;
private readonly ConcurrentDictionary<TKey, TaskCompletionSource<TValue>> completionSourceCache =
new ConcurrentDictionary<TKey, TaskCompletionSource<TValue>>();
public AsyncCache(Func<TKey, Task<TValue>> valueFactory)
{
this.valueFactory = valueFactory;
}
public async Task<TValue> GetItem(TKey key)
{
if (!completionSourceCache.TryAdd(key, new TaskCompletionSource<TValue>()))
{
return await completionSourceCache[key].Task;
}
TaskCompletionSource<TValue> taskSource = completionSourceCache[key];
try
{
var result = await valueFactory(key);
taskSource.SetResult(result);
}
catch (Exception e)
{
taskSource.SetException(e);
}
return await taskSource.Task;
}
}
The idea was that a key is associated with a TaskCompletionSource. The first time a particular item is requested, the TaskCompletionSource will be added to the dictionary for that key. Any subsequent (potentially reentrant) callers would be returned the Task associated with the TaskCompletionSource.
The first caller would then go on to actually call the method, and once the value was determined, this would be associated with the TaskCompletionSource, and anyone waiting on that Task would be able to view the result.
This is my first use of TaskCompletionSource, so please tell me if you think I've missed anything, or done anything wrong.
Note: I am aware of the ParallelExtensionsExtras AsyncCache here: https://blogs.msdn.microsoft.com/pfxteam/2010/04/23/parallelextensionsextras-tour-12-asynccache/
but I only found this after I'd written the above! This is the only other implementation I've found with the semantics described, but since it uses Lazy, it does not await the result of the value factory. I'm not sure if this makes any difference though, i.e. does this mean it is not truly asynchronous, are new threads created, etc.
2 Answers 2
The way your code is today this should work fine. But if you ever decide to add a RemoveItem
method you can have issues. Since you are checking if the key exist with TryAdd
then accessing assuming the key is there. This works because nothing can remove a value from the ConcurrentDictionary
as it stands. But if you add the ability to remove an item from the ConcurrentDictionary
then calling a TryAdd
then assuming it's there is dangerous since the RemoveItem
could have been processed between those statements. This is why the ConcurrentDictionary
has a GetOrAdd
method to handle that situation for you.
With using a TaskCompletionSource
you would struggle to know if you should call the factory method or not from the GetOrAdd
since you wouldn't know if your thread was the one to add it. This is where the Lazy<>
version shines as it doesn't matter who added the Lazy<Task<TValue>>
both threads can await it since it was passed in the valuefactory.
To me it would be a natural path that RemoveItem
might be added later as most caching has a way to invalidate the cache items, for example if the Task
threw an Exception
maybe I want to remove it from the cache to try again. Someone coming after to add a RemoveItem
might not catch the subtle bug they introduced in the AddItem
that was working.
In a nutshell it will work today but might not be as future proof as you want it to be.
-
\$\begingroup\$ Ah yes! Thank you. Ok, so I've made an edit which I think will take care of this, using
GetOrAdd
. Hopefully I've not missed anything. It would be nice to get this working, because I'm still not sure whether theLazy<T>
implementation calls the factory asynchronously. I suppose there's alwaysAsyncLazy<T>
... \$\endgroup\$bornfromanegg– bornfromanegg2017年08月30日 08:41:11 +00:00Commented Aug 30, 2017 at 8:41 -
\$\begingroup\$ @bornfromanegg you don't need to await the factory in the lazy implantation. It's good enough to return the task. awaiting doesn't "start" the task that happens at creation. awaiting just says don't do any code following the await until the task returned something. See stackoverflow.com/questions/38017016/… \$\endgroup\$CharlesNRice– CharlesNRice2017年08月30日 13:56:42 +00:00Commented Aug 30, 2017 at 13:56
-
\$\begingroup\$ Ah yes, of course. The Lazy implementation is looking better now the more I look at it. \$\endgroup\$bornfromanegg– bornfromanegg2017年09月01日 11:29:38 +00:00Commented Sep 1, 2017 at 11:29
Based on @CharlesNRice's comments, I have revised the GetItem code as follows:
public async Task<TValue> GetItem(TKey key)
{
var newSource = new TaskCompletionSource<TValue>();
var currentSource = completionSourceCache.GetOrAdd(key, newSource);
if (currentSource != newSource)
{
return await currentSource.Task;
}
try
{
var result = await valueFactory(key);
newSource.SetResult(result);
}
catch (Exception e)
{
newSource.SetException(e);
}
return await newSource.Task;
}
Hopefully this will address the remove issue. If currentSource != newSource
that means that the item was already in the dictionary and we should just return that. If someone removes the item immediately after GetOrAdd
then that's fine. We just return what we have, and if anyone else subsequently asks for the same item, it will be readded.
-
\$\begingroup\$ Also something you might want to consider if you sticking with awaiting the factory is if you need configureawait false or not. Typically any framework code would have configureawait(false) if it doesn't need to switch back to the thread that started the await. \$\endgroup\$CharlesNRice– CharlesNRice2017年08月30日 13:58:41 +00:00Commented Aug 30, 2017 at 13:58
-
\$\begingroup\$ Yes, although presumably I wouldn't need it on the last await, since that task will always be complete by the time I await it, so my understanding is that it will just continue in the current context. \$\endgroup\$bornfromanegg– bornfromanegg2017年09月01日 11:32:27 +00:00Commented Sep 1, 2017 at 11:32
Task
cannot be re-run. What do you need this cache for? Isn't it the same asTask.WaitAll
? Perhaps you could add a usage example? \$\endgroup\$dict.TryAdd(key, valueFactory(key))
. This would work, but the item is not added to the dictionary until aftervalueFactory(key)
completes. Thus a reentrant thread could call TryAdd a second time, and run valueFactory again. Although only one result would be added to the dictionary, it would mean that valueFactory had run twice for the same key, which is not what I want in this case. \$\endgroup\$pending
dictionary with a lock, and have it map item to timestamp, which would let you timeout and retry ancient requests. Or chain subsequent tasks to depend on result from initial task. \$\endgroup\$