I have a foreach
loop that I am parallelizing and I noticed something odd. The code looks like
double sum = 0.0;
Parallel.ForEach(myCollection
it is possible that the sum variable is being unexpectantly affected by the parallelization?
Yes.
Access to a double
is not atomic and the sum += ...
operation is never thread-safe, not even for types that are atomic. So you have multiple race conditions and the result is unpredictable.
You could use something like:
double sum = myCollection.AsParallel().Sum(arg => ComplicatedFunction(arg));
or, in a shorter notation
double sum = myCollection.AsParallel().Sum(ComplicatedFunction);
If you think about that sum += ComplicatedFunction
as being actually composed of a bunch of operations, say:
r1 <- Load current value of sum
r2 <- ComplicatedFunction(...)
r1 <- r1 + r2
So now we randomly interleave two (or more) parallel instances of this. One thread may be holding a stale "old value" of sum which it uses to perform its computation, the result of which it writes back over top of some modified version of sum. It's a classic race condition, because some results are getting lost in a nondeterministic way based on how the interleaving is done.
Or you can use Parallel Aggregation Operations, as properly defined in .Net. Here is the code
object locker = new object();
double sum= 0.0;
Parallel.ForEach(mArray,
() => 0.0, // Initialize the local value.
(i, state, localResult) => localResult + ComplicatedFunction(i), localTotal => // Body delegate which returns the new local total. // Add the local value
{
lock (locker) sum4+= localTotal;
} // to the master value.
);
Like the others answers mentioned, updating the sum
variable from multiple threads (which is what Parallel.ForEach does) is not a thread-safe operation. The trivial fix of acquiring a lock before doing the update will fix that problem.
double sum = 0.0;
Parallel.ForEach(myCollection, arg =>
{
lock (myCollection)
{
sum += ComplicatedFunction(arg);
}
});
However, that introduces yet another problem. Since the lock is acquired on each iteration then that means the execution of each iteration will be effectively serialized. In other words, it would have been better to just use a plain old foreach
loop.
Now, the trick in getting this right is to partition the problem in separate and independent chucks. Fortunately that is super easy to do when all you want to do is sum the result of the iterations because the sum operation is commutative and associative and because the intermediate results of the iterations are independent.
So here is how you do it.
double sum = 0.0;
Parallel.ForEach(myCollection,
() => // Initializer
{
return 0D;
},
(item, state, subtotal) => // Loop body
{
return subtotal += ComplicatedFunction(item);
},
(subtotal) => // Accumulator
{
lock (myCollection)
{
sum += subtotal;
}
});