We have a working implementation that is doing I/O operations, returning data from blobs in an async/await manner from Azure Blob Storage.
//Method1 is not async, this is what is called from the controller
public IEnumerable<Data> Method1()
{
//Running the async method and returning the result from task
return Task.Run(() => GetDataAsync()).Result;
}
private async Task<IEnumerable<Data>> GetDataAsync()
{
//There are multiple blob address where the data is held. The code creates, in parallel multiple tasks for each address.
//It returns tasks that will be run in Async pattern
var tasks = multipleBlobAddress.AsParallel.Select(blobAddress =>{
Task<IEnumerable<Data>> task = GetDataFromBlobsAsync(blobAddress);
return task;
});
//Awaits all tasks to complete
var completedTasks = await Task.WhenAll<IEnumerable<Data>>(tasks);
//Selects all the tasks and returns them. Each tasks has data.
return completedTasks.SelectMany(t => t);
}
private Task<IEnumerable<Data>> GetDataFromBlobsAsync(string address)
{
//Opens the blob and reads from it
using (var blobStream = blobService.OpenRead(address))
{
//Deserialises the data read from the blob
data = service.DeserialiseData(blobStream);
}
return Task.FromResult<IEnumerable<Data>>(data);
}
We have understood that the best way to read from blobs is to follow the async/await pattern and not using the AsParallel
method (other suggestions are most welcomed). I have the following questions:
- By creating tasks in parallel and then waiting for all of them to complete, do we lose any performance?
- What are the obvious areas where we got it wrong or we can improve and potentially increase the performance or make the code better to maintain/read?
- Are we correctly following the async / await pattern?
If you need any extra information about the code, I will happily provide it and hope to make it clearer.
3 Answers 3
There is never a need to do this:
return Task.Run(() => GetDataAsync()).Result;
Why start a new task and then synchronously wait with .Result
? This is the sort of code that will bite you later when you find out that a synchronisation context can cause a deadlock. See Stephen Cleary's blog post on the subject of an async deadlock. Stephen's blog is an absolute gold mine of information on aysnc
and await
.
Either use the synchronous api all the way from your controller down into the Azure Storage SDK or use the asynchronous one (with async
and await
). Don't mix and match - that's a source of errors and also pointless. Aysnc is useful when there's something else the thread could be doing. E.g. in a web app that's serving requests. If the web app is blocked by .Result
or Wait
then there is no benefit and you should use the synchronous API.
-
\$\begingroup\$ Thank you for this. This is the kind of answer I was looking for! I will definitely have a look at the blog post you mentioned. The reason we are using async is that we think it helps us downloading multiple blobs at the same time and not one after the other. Would a Task-based Asynchronous pattern helps us or hinders us even more? \$\endgroup\$Mihai Neagoe– Mihai Neagoe2016年11月01日 15:11:21 +00:00Commented Nov 1, 2016 at 15:11
-
\$\begingroup\$ This is my answer from another comment why we are using
Task.Run(() => GetDataAsync()).Result
- "We are implementing against an interface where Method1 is not a async member, but we can change that interface if it's necessary and it impacts the performance, we are not bound to it." \$\endgroup\$Mihai Neagoe– Mihai Neagoe2016年11月01日 15:18:14 +00:00Commented Nov 1, 2016 at 15:18 -
\$\begingroup\$ what I think this line of code does
return Task.Run(() => GetDataAsync()).Result;
, it queues the task to run and then it returns the result. Not really sure what you mean by not returning anything. \$\endgroup\$Mihai Neagoe– Mihai Neagoe2016年11月01日 16:36:16 +00:00Commented Nov 1, 2016 at 16:36 -
3\$\begingroup\$ @MihaiNeagoe - and what benefit do you think it gives you over simply doing
GetDataAsync().Result
? \$\endgroup\$RobH– RobH2016年11月01日 20:17:42 +00:00Commented Nov 1, 2016 at 20:17
private async Task<IEnumerable<Data>> GetDataAsync() { //There are multiple blob address where the data is held. The code creates, in parallel multiple tasks for each address. //It returns tasks that will be run in Async pattern var tasks = multipleBlobAddress.AsParallel.Select(blobAddress =>{ Task<IEnumerable<Data>> task = GetDataFromBlobsAsync(blobAddress); return task; });
Firstly, the purpose of AsParallel
is to split work between multiple threads; the purpose of async
/await
is to avoid the need for multiple threads when handling things such as I/O. So combining AsParallel
with async
/await
is a code smell.
Secondly, what's multipleBlobAddress
? A field of the class? In my opinion it should be an argument to the method rather than a hidden parameter.
Thirdly, that lambda is far more complicated than it needs to be. The code I've quoted above could be
private async Task<IEnumerable<Data>> GetDataAsync(IEnumerable<string> blobAddresses)
{
var tasks = blobAddresses.Select(GetDataFromBlobsAsync);
//Awaits all tasks to complete var completedTasks = await Task.WhenAll<IEnumerable<Data>>(tasks); //Selects all the tasks and returns them. Each tasks has data. return completedTasks.SelectMany(t => t); }
This will work, but it launches all of the downloads simultaneously. You'll probably find (and I speak from experience here) that you get better performance by running no more than n
simultanous downloads for some value of n
between 4
and 16
. Writing a WhenAllThrottled<T extends Task>(IEnumerable<T>, int)
method which uses WhenAny
to launch a new task when one completes is a good exercise in async
/await
programming. Note that the IEnumerable<T>
has to be lazy for it to actually work as intended.
private Task<IEnumerable<Data>> GetDataFromBlobsAsync(string address) { //Opens the blob and reads from it using (var blobStream = blobService.OpenRead(address)) { //Deserialises the data read from the blob data = service.DeserialiseData(blobStream); } return Task.FromResult<IEnumerable<Data>>(data); }
This is quite alarming. What is data
? It seems to be using a field to pass a value whose scope is a single method. If the tasks are executing in separate threads then this is a definite bug, because there's no synchronisation and one thread could overwrite data
between another thread's assignment and invocation of Task.FromResult<T>
. Without AsParallel
it's probably safe, but it's definitely a code smell and risks becoming a race condition if the class is refactored.
I should also add that the name is confusing. If it's just getting data from one blob, why is it not GetDataFromBlobAsync
?
-
\$\begingroup\$ Thank you for your answer, really good points to make. For the first point we were not aware that combining parallel and async/await would be code smell, noted and we will review our architecture. Second point,
multipleBlobAddress
is as you said an argument and also as you wrote in the code, it is an IEnumerable of a object. For your third, we will look into limiting the number of downloads, very good advice. And for your final point,data
is an IEnumerable of a object and each object holds a certain amount of data. It's bound to the method scope. I will edit my code. \$\endgroup\$Mihai Neagoe– Mihai Neagoe2016年11月02日 13:41:34 +00:00Commented Nov 2, 2016 at 13:41
To answer the questions first,
- Running Parallelism is for better efficiency and speed when executing the same code repeatedly.
- You can get threading wrong for not watching out for common threading issues. Msdn link has some good General Recommendations.
- Not really, async and await is not the same as Parallelism as you might be thinking, there is an excellent course on Pluralsight which explained what it was and how it was properly implemented. - I'm not sure if I can provide a link can a moderator let me know?
Difference between the two is that the Parallelism/Threading is executing the same code but with different parameters at the same time.
Async and Await is a mechanism to let the computer decide to stop working the current process, Await, as the response is not ready at that time. The computer can go and do something else. When the Async result is returned the computer will come back and continue on. Basically not to block on the main thread.
Ok so looking at your code, delete method1
and use Async and await all the way through the code.
I've tried to change your code, but as its not complete I can only attempt to fill in the gaps for you.
public async Task<IEnumerable<Data>> GetDataAsync()
{
//There are multiple blob address where the data is held. The code creates, in parallel multiple tasks for each address.
//It returns tasks that will be run in Async pattern
var tasks = multipleBlobAddress.AsParallel.Select(async blobAddress =>
{
IEnumerable<Data> task = await GetDataFromBlobsAsync(blobAddress);
return task;
});
//Awaits all tasks to complete
var completedTasks = await Task.WhenAll((IEnumerable<Data>)(tasks));
//Selects all the tasks and returns them. Each tasks has data.
return completedTasks.SelectMany(t => t);
}
private async Task<Data> GetDataFromBlobsAsync(string address)
{
//Opens the blob and reads from it
using (var blobStream = await blobService.OpenRead(address)) // at the bottom there should be a Async method
{
//Deserialises the data read from the blob
data = service.DeserialiseData(blobStream);
}
return data;
}
-
\$\begingroup\$ Thank you for your answer. The reason why we use
Method1
, it helps us to transition from a synchronous code to an asynchronous pattern. If you would like to expand on the reason you felt the method needs deleting, that would be great. For your third answer on, you mention that we are not using the async / await pattern correctly. Could you edit your code on where you think we are not following the pattern correctly? \$\endgroup\$Mihai Neagoe– Mihai Neagoe2016年11月01日 14:17:24 +00:00Commented Nov 1, 2016 at 14:17 -
\$\begingroup\$ I just saw your comment
// at the bottom there should be a Async method
. Thanks for that \$\endgroup\$Mihai Neagoe– Mihai Neagoe2016年11月01日 14:21:45 +00:00Commented Nov 1, 2016 at 14:21 -
\$\begingroup\$ I would flip the question back and ask why would you want to do that? It may be helpful to expand the question to state what you are using, Console App, Web App, Service etc? I have edited the code the best I could with what was in the Question. What research have you done in regards to Async and Await? have you just looked at TAP (Task-based Asynchronous Pattern)? \$\endgroup\$Geek– Geek2016年11月01日 14:27:40 +00:00Commented Nov 1, 2016 at 14:27
-
\$\begingroup\$ We are implementing against an interface where
Method1
is not a async member, but we can change that interface if it's necessary and it impacts the performance, we are not bound to it and it's a web app. This is our first iteration of the code and yes we need to research a little bit more on the async patterns, wanted someone with different eyes to look at the code. \$\endgroup\$Mihai Neagoe– Mihai Neagoe2016年11月01日 14:34:36 +00:00Commented Nov 1, 2016 at 14:34 -
3\$\begingroup\$ Your first point is not sourced at all ... I'd say [citation needed]. The second point is a link-only answer, which is discouraged and the third reads like an advertisement. Overall IMO your answer suffers from a credibility problem :( -1 \$\endgroup\$Vogel612– Vogel6122016年11月02日 16:35:38 +00:00Commented Nov 2, 2016 at 16:35
await Task.WhenAll(IEnumerable<Data>(tasks));
appears to be invoking the typeIEnumerable<Data>
as a function. You should copy-paste the actual working code, not attempt to rewrite it from memory. \$\endgroup\$