I\'ve been testing the performance of System.Threading.Parallel vs a Threading and I\'m surprised to see Parallel taking longer to finish tasks than threading. I\'m sure it\'s d
Refering to a blog post by Reed Copsey Jr:
Parallel.ForEach is a bit more complicated, however. When working with a generic IEnumerable, the number of items required for processing is not known in advance, and must be discovered at runtime. In addition, since we don’t have direct access to each element, the scheduler must enumerate the collection to process it. Since IEnumerable is not thread safe, it must lock on elements as it enumerates, create temporary collections for each chunk to process, and schedule this out.
The locking and copying could make Parallel.ForEach take longer. Also partitioning and the scheduler of ForEach could impact and give overhead. I tested your code and increased the sleep of each task, and then the results are closer, but still ForEach is slower.
[Edit - more research]
I added the following to the execution loops:
if (Thread.CurrentThread.ManagedThreadId > maxThreadId)
maxThreadId = Thread.CurrentThread.ManagedThreadId;
What this shows on my machine is that it uses 10 threads less with ForEach, compared to the other one with the current settings. If you want more threads out of ForEach, you would have to fiddle around with ParallelOptions and the Scheduler.
See Does Parallel.ForEach limits the number of active threads?
It's logical :-)
That would be the first time in history that addition of one (or two) layers of code improved performance. When you use convenience libraries you should expect to pay the price. BTW you haven't posted the numbers. Got to publish results :-)
To make things a bit more failr (or biased :-) for the Parallel-s, convert the list into array.
Then to make them totally unfair, split the work on your own, make an array of just 10 items and totally spoon feed actions to Parallel. You are of course doing the job that Parallel-s promised to do for you at this point but it's bound to be an interesting number :-)
BTW I just read that Reed's blog. The partitioning used in this question is what he calls the most simple and naive partitioning. Which makes it a very good elimination test indeed. You still need to check the zero work case just to know if it's totally hosed.
I think I can answer your question. First of all, you didn't write how many cores your system has. if you are running a dual-core, only 4 thread will work using the Parallel.For
while you are working with 10 threads in your Thread
example. More threads will work better as the task you are running (Printing + Short sleep) is a very short task for threading and the thread overhead is very large compared to the task, I'm almost sure that if you write the same code without threads it will work faster.
Both your methods works pretty much the same but if you create all the threads in advance you save a lot as the Parallel.For
uses the Task pool which adds some move overhead.
The comparison is not very fair in regard to Threading.Parallel. You tell your custom thread pool that it'll need 10 threads. Threading.Parallel does not know how much threads it will need so it tries to adapt at run-time taking into account such things as current CPU load and other things. Since the number of iterations in the test is small enough you can this number of threads adaption penalty. Providing the same hint for Threading.Parallel will make it run much faster:
int workerThreads;
int completionPortThreads;
ThreadPool.GetMinThreads(out workerThreads, out completionPortThreads);
ThreadPool.SetMinThreads(10, completionPortThreads);