问题
In one of my projects that\'s kinda an aggregator, I parse feeds, podcasts and so from the web.
If I use sequential approach, given that a large number of resources, it takes quite a time to process all of them (because of network issues and similar stuff);
foreach(feed in feeds)
{
read_from_web(feed)
parse(feed)
}
So I want to implement concurrency and couldn\'t decide if I should basically use ThreadPools to process with worker threads or just rely on TPL to get it sorted.
ThreadPools for sure will handle the job for me with worker threads and I\'ll get what I expect (and in multi-core CPU environments, the other cores will be also utilized also).
But I still want to consider TPL too as it\'s recommend method but I\'m a bit concerned about it. First of all I know that TPL uses ThreadPools but adds additional layer of decision making. I\'m mostly concerned of the condition that where a single-core environment is present. If I\'m not wrong TPL starts with a number worker-threads equal to number of available CPU-cores at the very beginning. I do fear of TPL producing similar results to sequential approach for my IO-bound case.
So for IO-bound operations (in my case reading resources from web), is it best to use ThreadPools and control the things, or better just rely on TPL? Can TPL also be used in IO-bound scenarios?
Update: My main concern is that -- on a single-core CPU environment will TPL just behave like sequential approach or will it still offer concurrency? I\'m already reading Parallel Programming with Microsoft .NET and so the book but couldn\'t find an exact answer for this.
Note: this is a re-phrasing of my previous question [ Is it possible to use thread-concurrency and parallelism together? ] which was quite phrased wrong.
回答1:
So i instead decided to write tests for this and see it on practical data.
Test Legend
- Itr: Iteration
- Seq: Sequential Approach.
- PrlEx: Parallel Extensions - Parallel.ForEach
- TPL: Task Parallel Library
- TPool: ThreadPool
Test Results
Single-Core CPU [Win7-32] -- runs under VMWare --
Test Environment: 1 physical cpus, 1 cores, 1 logical cpus.
Will be parsing a total of 10 feeds.
________________________________________________________________________________
Itr. Seq. PrlEx TPL TPool
________________________________________________________________________________
#1 10.82s 04.05s 02.69s 02.60s
#2 07.48s 03.18s 03.17s 02.91s
#3 07.66s 03.21s 01.90s 01.68s
#4 07.43s 01.65s 01.70s 01.76s
#5 07.81s 02.20s 01.75s 01.71s
#6 07.67s 03.25s 01.97s 01.63s
#7 08.14s 01.77s 01.72s 02.66s
#8 08.04s 03.01s 02.03s 01.75s
#9 08.80s 01.71s 01.67s 01.75s
#10 10.19s 02.23s 01.62s 01.74s
________________________________________________________________________________
Avg. 08.40s 02.63s 02.02s 02.02s
________________________________________________________________________________
Single-Core CPU [WinXP] -- runs under VMWare --
Test Environment: 1 physical cpus, NotSupported cores, NotSupported logical cpus.
Will be parsing a total of 10 feeds.
________________________________________________________________________________
Itr. Seq. PrlEx TPL TPool
________________________________________________________________________________
#1 10.79s 04.05s 02.75s 02.13s
#2 07.53s 02.84s 02.08s 02.07s
#3 07.79s 03.74s 02.04s 02.07s
#4 08.28s 02.88s 02.73s 03.43s
#5 07.55s 02.59s 03.99s 03.19s
#6 07.50s 02.90s 02.83s 02.29s
#7 07.80s 04.32s 02.78s 02.67s
#8 07.65s 03.10s 02.07s 02.53s
#9 10.70s 02.61s 02.04s 02.10s
#10 08.98s 02.88s 02.09s 02.16s
________________________________________________________________________________
Avg. 08.46s 03.19s 02.54s 02.46s
________________________________________________________________________________
Dual-Core CPU [Win7-64]
Test Environment: 1 physical cpus, 2 cores, 2 logical cpus.
Will be parsing a total of 10 feeds.
________________________________________________________________________________
Itr. Seq. PrlEx TPL TPool
________________________________________________________________________________
#1 07.09s 02.28s 02.64s 01.79s
#2 06.04s 02.53s 01.96s 01.94s
#3 05.84s 02.18s 02.08s 02.34s
#4 06.00s 01.43s 01.69s 01.43s
#5 05.74s 01.61s 01.36s 01.49s
#6 05.92s 01.59s 01.73s 01.50s
#7 06.09s 01.44s 02.14s 02.37s
#8 06.37s 01.34s 01.46s 01.36s
#9 06.57s 01.30s 01.58s 01.67s
#10 06.06s 01.95s 02.88s 01.62s
________________________________________________________________________________
Avg. 06.17s 01.76s 01.95s 01.75s
________________________________________________________________________________
Quad-Core CPU [Win7-64] -- HyprerThreading Supported --
Test Environment: 1 physical cpus, 4 cores, 8 logical cpus.
Will be parsing a total of 10 feeds.
________________________________________________________________________________
Itr. Seq. PrlEx TPL TPool
________________________________________________________________________________
#1 10.56s 02.03s 01.71s 01.69s
#2 07.42s 01.63s 01.71s 01.69s
#3 11.66s 01.69s 01.73s 01.61s
#4 07.52s 01.77s 01.63s 01.65s
#5 07.69s 02.32s 01.67s 01.62s
#6 07.31s 01.64s 01.53s 02.17s
#7 07.44s 02.56s 02.35s 02.31s
#8 08.36s 01.93s 01.73s 01.66s
#9 07.92s 02.15s 01.72s 01.65s
#10 07.60s 02.14s 01.68s 01.68s
________________________________________________________________________________
Avg. 08.35s 01.99s 01.75s 01.77s
________________________________________________________________________________
Summarization
- Whether you run on a single-core environment or a multi-core one, Parallel Extensions, TPL and ThreadPool behaves the same and gives approximate results.
- Still TPL has advantages like easy exception handling, cancellation support and ability to easily return Task results. Though Parallel Extensions is also another viable alternative.
Running tests on your own
You can download the source here and run on-your-own. If you can post the results, i'll add them also.
Update: Fixed the source link.
回答2:
If you're trying to maximize throughput for IO-bound tasks you absolutely must combine the traditional Asynchronous Processing Model (APM) APIs with your TPL based work. The APM APIs are the only way to unblock the CPU thread whilst the asynchronous IO callback is pending. The TPL provides the TaskFactory::FromAsync helper method to assist in combining APM and TPL code.
Check out this section of the .NET SDK on MSDN entitled TPL and Traditional .NET Asynchronous Programming for more information on how to combine these two programming models to achieve async nirvana.
回答3:
You are right that the TPL does remove some of the control you have when you create your own thread pool. But this is only correct if you do not want to dig deeper. The TPL does allow you to create long running Tasks that are not part of the TPL thread pool and could serve your purpose well. The published book which is a free read Parallel Programming with Microsoft .NET will give you much more insight how the TPL is meant to be used. You have always the option to give Paralle.For, Tasks explicit parameters how many threads should be allocated. Besides this you can replace the TPL scheduler with your own one if your want full control.
回答4:
You can assign your own task scheduler to a TPL task. The default work stealing one is quite clever though.
回答5:
I do fear of TPL producing similar results to sequential approach for my IO-bound case.
I think it will. What is the bottleneck? Is is parsing or downloading? Multithreading will not help you much with downloading from the web.
I would use Task Parallel Library for cropping, applying mask or effects for downloaded images, cuting some sample from podcast etc. It's more scalable.
But it will not be the order of magnitude speed up. Spend your resources to implementing some features, testing.
PS. "Wow my function execustes in 0.7 s instead of 0.9" ;)
回答6:
If you parallelize your calls to the urls, I think it will improve your application, even if have only one core. Take a look on this code:
var client = new HttpClient();
var urls = new[]{"a", "url", "to", "find"};
// due to the EAP pattern, this will run in parallel.
var tasks = urls.Select(c=> client.GetAsync(c));
var result = Tasks.WhenAll(task).ContinueWith(a=> AnalyzeThisWords(a.Result));
result.Wait(); // don't know if this is needed or it's correct to call wait
The difference between multithreading and asynchrony in this case is how the callback/completion is done.
When using EAP the number of tasks is not related with the number of threads.
As you're relying on the GetAsync task, the http client uses a networkstream (socket, tcp client or whatever) and signalize it to raise an event when the BeginRead/EndRead is done. So, no threads are involved in this moment.
After the completion is called, maybe a new thread is created, but it's up to TaskScheduler (used in call GetAsync/ContinueWith call) to create a new thread, use an existing thread or inline the task to use the calling thread.
If the AnalyzeThisWords
blocks for too much time, then you start to get bottlenecks as the "callback" on the ContinueWith is done from a thread pool worker.
来源:https://stackoverflow.com/questions/5213695/should-i-use-threadpools-or-task-parallel-library-for-io-bound-operations