Different summation results with Parallel.ForEach

前端 未结 4 733
梦谈多话
梦谈多话 2020-12-15 18:31

I have a foreach loop that I am parallelizing and I noticed something odd. The code looks like

double sum = 0.0;

Parallel.ForEach(myCollection         


        
相关标签:
4条回答
  • 2020-12-15 18:58

    it is possible that the sum variable is being unexpectantly affected by the parallelization?

    Yes.
    Access to a double is not atomic and the sum += ... operation is never thread-safe, not even for types that are atomic. So you have multiple race conditions and the result is unpredictable.

    You could use something like:

    double sum = myCollection.AsParallel().Sum(arg => ComplicatedFunction(arg));
    

    or, in a shorter notation

    double sum = myCollection.AsParallel().Sum(ComplicatedFunction);
    
    0 讨论(0)
  • 2020-12-15 19:11

    If you think about that sum += ComplicatedFunctionas being actually composed of a bunch of operations, say:

    r1 <- Load current value of sum
    r2 <- ComplicatedFunction(...)
    r1 <- r1 + r2
    

    So now we randomly interleave two (or more) parallel instances of this. One thread may be holding a stale "old value" of sum which it uses to perform its computation, the result of which it writes back over top of some modified version of sum. It's a classic race condition, because some results are getting lost in a nondeterministic way based on how the interleaving is done.

    0 讨论(0)
  • 2020-12-15 19:13

    Or you can use Parallel Aggregation Operations, as properly defined in .Net. Here is the code

            object locker = new object();
            double sum= 0.0;
            Parallel.ForEach(mArray,
                            () => 0.0,                 // Initialize the local value.
                            (i, state, localResult) => localResult + ComplicatedFunction(i), localTotal =>   // Body delegate which returns the new local total.                                                                                                                                           // Add the local value
                                {
                                    lock (locker) sum4+= localTotal;
                                }    // to the master value.
                            );
    
    0 讨论(0)
  • 2020-12-15 19:15

    Like the others answers mentioned, updating the sum variable from multiple threads (which is what Parallel.ForEach does) is not a thread-safe operation. The trivial fix of acquiring a lock before doing the update will fix that problem.

    double sum = 0.0;
    Parallel.ForEach(myCollection, arg => 
    { 
      lock (myCollection)
      {
        sum += ComplicatedFunction(arg);
      }
    });
    

    However, that introduces yet another problem. Since the lock is acquired on each iteration then that means the execution of each iteration will be effectively serialized. In other words, it would have been better to just use a plain old foreach loop.

    Now, the trick in getting this right is to partition the problem in separate and independent chucks. Fortunately that is super easy to do when all you want to do is sum the result of the iterations because the sum operation is commutative and associative and because the intermediate results of the iterations are independent.

    So here is how you do it.

    double sum = 0.0;
    Parallel.ForEach(myCollection,
        () => // Initializer
        {
            return 0D;
        },
        (item, state, subtotal) => // Loop body
        {
            return subtotal += ComplicatedFunction(item);
        },
        (subtotal) => // Accumulator
        {
            lock (myCollection)
            {
              sum += subtotal;
            }
        });
    
    0 讨论(0)
提交回复
热议问题