Parallel Framework and avoiding false sharing

后端 未结 1 1686
攒了一身酷
攒了一身酷 2021-02-08 03:08

Recently, I had answered a question about optimizing a likely parallelizable method for generation every permutation of arbitrary base numbers. I posted an answer similar to the

1条回答
  •  野趣味
    野趣味 (楼主)
    2021-02-08 03:58

    First off, my initial assumption regarding Parallel.For() and Parallel.ForEach() was wrong.

    The poor parallel implementation very likely has 6 threads all attempting to write to a single CouncurrentStack() at once. The good implementation usuing thread locals (explained more below) only accesses the shared variable once per task, nearly eliminating any contention.

    When using Parallel.For() and Parallel.ForEach(), you cannot simply in-line replace a for or foreach loop with them. That's not to say it couldn't be a blind improvement, but without examining the problem and instrumenting it, using them is throwing multithreading at a problem because it might make it faster.

    **Parallel.For() and Parallel.ForEach() has overloads that allow you to create a local state for the Task they ultimately create, and run an expression before and after each iteration's execution.

    If you have an operation you parallelize with Parallel.For() or Parallel.ForEach(), it's likely a good idea to use this overload:

    public static ParallelLoopResult For(
        int fromInclusive,
        int toExclusive,
        Func localInit,
        Func body,
        Action localFinally
    )
    

    For example, calling For() to sum all integers from 1 to 100,

    var total = 0;
    
    Parallel.For(0, 101, () => 0,  // <-- localInit
    (i, state, localTotal) => { // <-- body
      localTotal += i;
      return localTotal;
    }, localTotal => { <-- localFinally
      Interlocked.Add(ref total, localTotal);
    });
    
    Console.WriteLine(total);
    

    localInit should be an lambda that initializes the local state type, which is passed to the body and localFinally lambdas. Please note I am not recommending implementing summing 1 to 100 using parallelization, but just have a simple example to make the example short.

    0 讨论(0)
提交回复
热议问题