Implementation of Asynchronous Cache

Question 1

I have been attempting to implement a cache which caches the results of an asynchronous method, with the restriction that I only want that method to run once for any particular item in the cache. Many other implementations I have seen allow the method to run multiple times. Although only the first returned value is actually stored, if this is a long running method, this is not desirable for me.

Storing Task<T> (instead of T) in the cache, as recommended by most implementers, is a good idea, but I still could not work out how to prevent the method that fetches the value (the valueFactory) from running more than once for any given item. The solution I came up with was to use TaskCompletionSource instead of Task for the items in the cache.

This is what I came up with:

public class AsyncCache<TKey, TValue>
 {
 private readonly Func<TKey, Task<TValue>> valueFactory;
 private readonly ConcurrentDictionary<TKey, TaskCompletionSource<TValue>> completionSourceCache =
 new ConcurrentDictionary<TKey, TaskCompletionSource<TValue>>();
 public AsyncCache(Func<TKey, Task<TValue>> valueFactory)
 {
 this.valueFactory = valueFactory;
 }
 public async Task<TValue> GetItem(TKey key)
 {
 if (!completionSourceCache.TryAdd(key, new TaskCompletionSource<TValue>()))
 {
 return await completionSourceCache[key].Task;
 }
 TaskCompletionSource<TValue> taskSource = completionSourceCache[key];
 try
 {
 var result = await valueFactory(key);
 taskSource.SetResult(result);
 }
 catch (Exception e)
 {
 taskSource.SetException(e);
 }
 return await taskSource.Task;
 }
 }

The idea was that a key is associated with a TaskCompletionSource. The first time a particular item is requested, the TaskCompletionSource will be added to the dictionary for that key. Any subsequent (potentially reentrant) callers would be returned the Task associated with the TaskCompletionSource.

The first caller would then go on to actually call the method, and once the value was determined, this would be associated with the TaskCompletionSource, and anyone waiting on that Task would be able to view the result.

This is my first use of TaskCompletionSource, so please tell me if you think I've missed anything, or done anything wrong.

Note: I am aware of the ParallelExtensionsExtras AsyncCache here: https://blogs.msdn.microsoft.com/pfxteam/2010/04/23/parallelextensionsextras-tour-12-asynccache/

but I only found this after I'd written the above! This is the only other implementation I've found with the semantics described, but since it uses Lazy, it does not await the result of the value factory. I'm not sure if this makes any difference though, i.e. does this mean it is not truly asynchronous, are new threads created, etc.

Question 2

I still could not work out how to prevent the task from running more than once for any given item What do you mean by this? Once completed a Task cannot be re-run. What do you need this cache for? Isn't it the same as Task.WaitAll? Perhaps you could add a usage example?

Question 3

Sorry - I've edited the question to make it a bit clearer, hopefully. But essentially: consider if I'd had a cache of Tasks. The add to the dictionary would then look something like dict.TryAdd(key, valueFactory(key)). This would work, but the item is not added to the dictionary until after valueFactory(key) completes. Thus a reentrant thread could call TryAdd a second time, and run valueFactory again. Although only one result would be added to the dictionary, it would mean that valueFactory had run twice for the same key, which is not what I want in this case.

Question 4

Note the solution here from the excellent @svick. This also allows the request to run multiple times. This is fine in some situations, but not what I wanted.

Question 5

Thank your for explaining. I think I now understand how it works ;-)

Question 6

It sounds like you want to immediately make a note that item is pending before invoking your current code. For example you might protect your pending dictionary with a lock, and have it map item to timestamp, which would let you timeout and retry ancient requests. Or chain subsequent tasks to depend on result from initial task.

Question 7

The way your code is today this should work fine. But if you ever decide to add a RemoveItem method you can have issues. Since you are checking if the key exist with TryAdd then accessing assuming the key is there. This works because nothing can remove a value from the ConcurrentDictionary as it stands. But if you add the ability to remove an item from the ConcurrentDictionary then calling a TryAdd then assuming it's there is dangerous since the RemoveItem could have been processed between those statements. This is why the ConcurrentDictionary has a GetOrAdd method to handle that situation for you.

With using a TaskCompletionSource you would struggle to know if you should call the factory method or not from the GetOrAdd since you wouldn't know if your thread was the one to add it. This is where the Lazy<> version shines as it doesn't matter who added the Lazy<Task<TValue>> both threads can await it since it was passed in the valuefactory.

To me it would be a natural path that RemoveItem might be added later as most caching has a way to invalidate the cache items, for example if the Task threw an Exception maybe I want to remove it from the cache to try again. Someone coming after to add a RemoveItem might not catch the subtle bug they introduced in the AddItem that was working.

In a nutshell it will work today but might not be as future proof as you want it to be.

Question 8

Ah yes! Thank you. Ok, so I've made an edit which I think will take care of this, using GetOrAdd. Hopefully I've not missed anything. It would be nice to get this working, because I'm still not sure whether the Lazy<T> implementation calls the factory asynchronously. I suppose there's always AsyncLazy<T>...

Question 9

@bornfromanegg you don't need to await the factory in the lazy implantation. It's good enough to return the task. awaiting doesn't "start" the task that happens at creation. awaiting just says don't do any code following the await until the task returned something. See stackoverflow.com/questions/38017016/…

Question 10

Ah yes, of course. The Lazy implementation is looking better now the more I look at it.

Question 11

Based on @CharlesNRice's comments, I have revised the GetItem code as follows:

public async Task<TValue> GetItem(TKey key)
{
 var newSource = new TaskCompletionSource<TValue>();
 var currentSource = completionSourceCache.GetOrAdd(key, newSource);
 if (currentSource != newSource)
 {
 return await currentSource.Task;
 }
 try
 {
 var result = await valueFactory(key);
 newSource.SetResult(result);
 }
 catch (Exception e)
 {
 newSource.SetException(e);
 }
 return await newSource.Task;
}

Hopefully this will address the remove issue. If currentSource != newSource that means that the item was already in the dictionary and we should just return that. If someone removes the item immediately after GetOrAdd then that's fine. We just return what we have, and if anyone else subsequently asks for the same item, it will be readded.

Question 12

Also something you might want to consider if you sticking with awaiting the factory is if you need configureawait false or not. Typically any framework code would have configureawait(false) if it doesn't need to switch back to the thread that started the await.

Question 13

Yes, although presumably I wouldn't need it on the last await, since that task will always be complete by the time I await it, so my understanding is that it will just continue in the current context.

CharlesNRice CharlesNRice 4,43816 silver badges18 bronze badges · Accepted Answer · 2017-08-29 16:55:42Z

The way your code is today this should work fine. But if you ever decide to add a RemoveItem method you can have issues. Since you are checking if the key exist with TryAdd then accessing assuming the key is there. This works because nothing can remove a value from the ConcurrentDictionary as it stands. But if you add the ability to remove an item from the ConcurrentDictionary then calling a TryAdd then assuming it's there is dangerous since the RemoveItem could have been processed between those statements. This is why the ConcurrentDictionary has a GetOrAdd method to handle that situation for you.

With using a TaskCompletionSource you would struggle to know if you should call the factory method or not from the GetOrAdd since you wouldn't know if your thread was the one to add it. This is where the Lazy<> version shines as it doesn't matter who added the Lazy<Task<TValue>> both threads can await it since it was passed in the valuefactory.

To me it would be a natural path that RemoveItem might be added later as most caching has a way to invalidate the cache items, for example if the Task threw an Exception maybe I want to remove it from the cache to try again. Someone coming after to add a RemoveItem might not catch the subtle bug they introduced in the AddItem that was working.

In a nutshell it will work today but might not be as future proof as you want it to be.

Ah yes! Thank you. Ok, so I've made an edit which I think will take care of this, using GetOrAdd. Hopefully I've not missed anything. It would be nice to get this working, because I'm still not sure whether the Lazy<T> implementation calls the factory asynchronously. I suppose there's always AsyncLazy<T>...
@bornfromanegg you don't need to await the factory in the lazy implantation. It's good enough to return the task. awaiting doesn't "start" the task that happens at creation. awaiting just says don't do any code following the await until the task returned something. See stackoverflow.com/questions/38017016/…
Ah yes, of course. The Lazy implementation is looking better now the more I look at it.

Stack Exchange Network

Implementation of Asynchronous Cache

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Implementation of Asynchronous Cache

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions