Recently, I had answered a question about optimizing a likely parallelizable method for generation every permutation of arbitrary base numbers. I posted an answer similar to the
First off, my initial assumption regarding Parallel.For()
and Parallel.ForEach()
was wrong.
The poor parallel implementation very likely has 6 threads all attempting to write to a single CouncurrentStack()
at once. The good implementation usuing thread locals (explained more below) only accesses the shared variable once per task, nearly eliminating any contention.
When using Parallel.For()
and Parallel.ForEach()
, you cannot simply in-line replace a for
or foreach
loop with them. That's not to say it couldn't be a blind improvement, but without examining the problem and instrumenting it, using them is throwing multithreading at a problem because it might make it faster.
**Parallel.For()
and Parallel.ForEach()
has overloads that allow you to create a local state for the Task
they ultimately create, and run an expression before and after each iteration's execution.
If you have an operation you parallelize with Parallel.For()
or Parallel.ForEach()
, it's likely a good idea to use this overload:
public static ParallelLoopResult For<TLocal>(
int fromInclusive,
int toExclusive,
Func<TLocal> localInit,
Func<int, ParallelLoopState, TLocal, TLocal> body,
Action<TLocal> localFinally
)
For example, calling For()
to sum all integers from 1 to 100,
var total = 0;
Parallel.For(0, 101, () => 0, // <-- localInit
(i, state, localTotal) => { // <-- body
localTotal += i;
return localTotal;
}, localTotal => { <-- localFinally
Interlocked.Add(ref total, localTotal);
});
Console.WriteLine(total);
localInit
should be an lambda that initializes the local state type, which is passed to the body
and localFinally
lambdas. Please note I am not recommending implementing summing 1 to 100 using parallelization, but just have a simple example to make the example short.