I was wondering whether it\'s true that async
-await
should not be used for \"high-CPU\" tasks. I saw this claimed in a presentation.
So I guess
Let's say your CalculateMillionthPrimeNumber
was something like the following (not very efficient or ideal in its use of goto
but very simple to undertand):
public int CalculateMillionthPrimeNumber()
{
List<int> primes = new List<int>(1000000){2};
int num = 3;
while(primes.Count < 1000000)
{
foreach(int div in primes)
{
if ((num / div) * div == num)
goto next;
}
primes.Add(num);
next:
++num;
}
return primes.Last();
}
Now, there's not useful point here at which this can do something asynchronously. Let's make it a Task-returning method using async
:
public async Task<int> CalculateMillionthPrimeNumberAsync()
{
List<int> primes = new List<int>(1000000){2};
int num = 3;
while(primes.Count < 1000000)
{
foreach(int div in primes)
{
if ((num / div) * div == num)
goto next;
}
primes.Add(num);
next:
++num;
}
return primes.Last();
}
The compiler will warn us about that, because there's nowhere for us to await
anything useful. Really calling this is going to be the same as a slightly more complicated version of calling Task.FromResult(CalculateMillionthPrimeNumber())
. That is to say, it's the same as doing the calculation and then creating an already-completed task that has the calculated number as its result.
Now, already-completed tasks aren't always pointless. For example, consider:
public async Task<string> GetInterestingStringAsync()
{
if (_cachedInterestingString == null)
_cachedInterestingString = await GetStringFromWeb();
return _cachedInterestingString;
}
This returns an already-completed task when the string is in the cache, and not otherwise, and in that case it will return pretty fast. Other cases are if there is more than one implementation of the same interface and not all implementations can use async I/O.
And likewise an async
method that await
s this method will return an already-completed task or not depending on this. It's actually a pretty great way of just staying on the same thread and doing what needs done when that is possible.
But if it's always possible then the only effect is an extra bit of bloat around creating the Task
object and the state-machine that async
uses to implement it.
So, pretty pointless. If that was how the version in your question was implemented then calculateMillionthPrimeNumber
would have had IsCompleted
returning true right from the beginning. You should have just called the non-async version.
Okay, as the implementers of CalculateMillionthPrimeNumberAsync()
we want to do something more useful for our users. So we do:
public Task<int> CalculateMillionthPrimeNumberAsync()
{
return Task.Factory.StartNew(CalculateMillionthPrimeNumber, CancellationToken.None, TaskCreationOptions.DenyChildAttach, TaskScheduler.Default);
}
Okay, now we're not wasting our user's time. DoIndependentWork()
will do stuff at the same time as CalculateMillionthPrimeNumberAsync()
, and if it it finishes first then the await
will release that thread.
Great!
Only, we haven't really moved the needle that much from the synchronous position. Indeed, especially if DoIndependentWork()
isn't arduous we may have made it a lot worse. The synchronous way would do everything on one thread, lets call it Thread A
. The new way does the calculation on Thread B
then either releases Thread A
, then synchronises back in a few possible ways. It's a lot of work, has it gained anything?
Well maybe, but the author of CalculateMillionthPrimeNumberAsync()
can't know that, because the factors that influence that are all in the calling code. The calling code could have done StartNew
itself, and been better able to fit the synchronisation options to the need when it did so.
So, while tasks can be a convenient way of calling cpu-bound code in parallel to another task, methods that do so are not useful. Worse they're deceiving as someone seeing CalculateMillionthPrimeNumberAsync
could be forgiven for believing that calling it wasn't pointless.
Unless CalculateMillionthPrimeNumberAsync
constantly uses async/await
by itself, there is no reason not to let the Task to run heavy CPU work, since it just delegates your method onto ThreadPool's thread.
What a ThreadPool thread is and how does it differ from a regular thread is written here.
In short, it just takes the threadpool thread into custody for quite a time (and the number of threadpool threads is limited), so, unless you are taking too many them, there is nothing to worry about.
I was wondering whether it's true that async-await should not be used for "high-CPU" tasks.
Yes, that's true.
My question is could the above be justified
I would say that it is not justified. In the general case, you should avoid using Task.Run
to implement methods with asynchronous signatures. Don't expose asynchronous wrappers for synchronous methods. This is to prevent confusion by consumers, particularly on ASP.NET.
However, there is nothing wrong with using Task.Run
to call a synchronous method, e.g., in a UI app. In this way, you can use multithreading (Task.Run
) to keep the UI thread free, and consume it elegantly with await
:
var task = Task.Run(() => CalculateMillionthPrimeNumber());
DoIndependentWork();
var prime = await task;
There are, in fact, two major uses of async/await. One (and my understanding is that this is one of the primary reasons that it was put into the framework) is to enable the calling thread to do other work while it's waiting for a result. This is mostly for I/O-bound tasks (i.e. tasks where the main "holdup" is some kind of I/O - waiting for a hard drive, server, printer, etc. to respond or complete its task).
As a side note, if you're using async/await in this way, it's important to make sure that you've implemented it in such a way that the calling thread can actually do other work while it's waiting for the result; I've seen plenty of cases where people do stuff like "A waits for B, which waits for C"; this can end up performing no better than if A just called B synchronously and B just called C synchronously (because the calling thread's never allowed to do other work while it's waiting for the results of B and C).
In the case of I/O-bound tasks, there's little point in creating an extra thread just to wait for a result. My usual analogy here is to think of ordering in a restaurant with 10 people in a group. If the first person the waiter asks to order isn't ready yet, the waiter doesn't just wait for him to be ready before he takes anyone else's order, nor does he bring in a second waiter just to wait for the first guy. The best thing to do in this case is to ask the other 9 people in the group for their orders; hopefully, by the time that they've ordered, the first guy will be ready. If not, at least the waiter's still saved some time because he spends less time being idle.
It's also possible to use things like Task.Run
to do CPU-bound tasks (and this is the second use for this). To follow our analogy above, this is a case where it would be generally useful to have more waiters - e.g. if there were too many tables for a single waiter to service. Really, all that this actually does "behind the scenes" is use the Thread Pool; it's one of several possible constructs to do CPU-bound work (e.g. just putting it "directly" on the Thread Pool, explicitly creating a new thread, or using a Background Worker) so it's a design question which mechanism you end up using.
One advantage of async/await
here is that it can (given the right circumstances) reduce the amount of explicit locking/synchronization logic you have to write manually. Here's a kind of dumb example:
private static async Task SomeCPUBoundTask()
{
// Insert actual CPU-bound task here using Task.Run
await Task.Delay(100);
}
public static async Task QueueCPUBoundTasks()
{
List<Task> tasks = new List<Task>();
// Queue up however many CPU-bound tasks you want
for (int i = 0; i < 10; i++)
{
// We could just call Task.Run(...) directly here
Task task = SomeCPUBoundTask();
tasks.Add(task);
}
// Wait for all of them to complete
// Note that I don't have to write any explicit locking logic here,
// I just tell the framework to wait for all of them to complete
await Task.WhenAll(tasks);
}
Obviously, I'm assuming here that the tasks are completely parallelizable. Note, too, that you could have used the Thread Pool yourself here, but that would be a little less convenient because you'd need some way to figure out yourself whether all of them had completed (rather than just letting the framework figure that out for you). You may also have been able to use a Parallel.For
loop here.