Task.StartNew() vs Parallel.ForEach : Multiple Web Requests Scenario

一世执手 提交于 2019-12-08 16:54:26

问题


I have read through all the related questions in SO, but a little confused on the best approach for my scenario where multiple web service calls are fired.

I have an aggregator service that takes an input, parses and translates it into multiple web requests, makes the web request calls (unrelated, so could be fired in parallel) and consolidates the response which is sent back to the caller. The following code is used right now -

list.ForEach((object obj) =>
{
     tasks.Add(Task.Factory.StartNew((object state) => 
     {
           this.ProcessRequest(obj);
     }, obj, CancellationToken.None, TaskCreationOptions.AttachedToParent, TaskScheduler.Default));
});
await Task.WhenAll(tasks);

the await Task.WhenAll(tasks) comes from Scott Hanselman's post where it is said that

"A better solution from a scalability perspective, says Stephen, is to take advantage of asynchronous I/O. When you're calling out across the network, there's no reason (other than convenience) to blocks threads while waiting for the response to come back"

The existing code appears to consume too many threads and the Processor Time shoots up to 100% on production load and that gets me thinking.

The other alternate is to use Parallel.ForEach which uses a partitioner but and also "blocks" the call, which is fine for my scenario.

Considering this is all "Async IO" work and not "CPU bound" work, and the web requests are not long running (return in max 3 seconds), I tend to believe the existing code is good enough. But would this provide better throughput than Parallel.ForEach? Parallel.ForEach probably uses "minimal" number of Tasks because of the partitioning and therefore optimal use of threads(?). I did test Parallel.ForEach with some local tests and it doesn't appear to be any better.

The goal is to reduce the CPU time and increase throughput and therefore better scalability. Is there a better approach for handling web requests in parallel?

Appreciate any inputs, thanks.

EDIT: ProcessRequest method shown in the code sample indeed uses HttpClient and its async methods to fire requests (PostAsync, GetAsync, PutAsync).


回答1:


makes the web request calls (unrelated, so could be fired in parallel)

What you actually want is to call them concurrently, not in parallel. That is, "at the same time", not "using multiple threads".

The existing code appears to consume too many threads

Yeah, I think so too. :)

Considering this is all "Async IO" work and not "CPU bound" work

Then it should all be done asynchronously, and not using task parallelism or other parallel code.

As Antii pointed out, you should make your asynchronous code asynchronous:

public async Task ProcessRequestAsync(...);

Then what you want to do is consume it using asynchronous concurrency (Task.WhenAll), not parallel concurrency (StartNew/Run/Parallel):

await Task.WhenAll(list.Select(x => ProcessRequestAsync(x)));



回答2:


If you are CPU bound (you are - "Processor Time shoots up to 100% ") you need to reduce CPU usage. Async IO does nothing to help with that. If anything it causes a little more CPU usage (unnoticeable here).

Profile the app to see what takes so much CPU time and optimize that code.

The way you initiate parallelism (Parallel, Task, async IO) does nothing to the efficiency of the parallel action itself. The network does not get faster if you call it in an async way. It's the same hardware still. Also no less CPU usage.

Determine the optimal degree of parallelism experimentally and choose a parallelism technique that is suitable for that degree. If it's a few dozen then threads are totally fine. If it's in the hundreds seriously consider async IO.




回答3:


Wrapping synchronous calls inside Task.Factory.StartNew doesn't give you any benefits of async. You should use proper async functions for better scalability. Notice how Scott Hanselman makes async functions in post you are referring.

For example

public async Task<bool> ValidateUrlAsync(string url)
{
    using(var response = (HttpWebResponse)await WebRequest.Create(url).GetResponseAsync())
    return response.StatusCode == HttpStatusCode.Ok;
}

Checkout http://blogs.msdn.com/b/pfxteam/archive/2012/03/24/10287244.aspx

So, your ProcessRequest method should be implemented as async like

public async Task<bool> ProcessRequestAsync(...)

then you can just

tasks.Add(this.ProcessRequestAsync(obj))

If you start task with Task.Factory.StartNew it doesn't work as async even if your ProcessRequest method is internally making async calls. If you wanna use Task.Factory you should make your lambda also async like:

tasks.Add(Task.Factory.StartNew(async (object state) => 
{
    await this.ProcessRequestAsync(obj);
}, obj, CancellationToken.None, TaskCreationOptions.AttachedToParent,   TaskScheduler.Default));


来源:https://stackoverflow.com/questions/30657202/task-startnew-vs-parallel-foreach-multiple-web-requests-scenario

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!