I am trying to speed up my calculation times by using Parallel.For
. I have an Intel Core i7 Q840 CPU with 8 cores, but I only manage to get a performance ratio of 4
Using a global variable can introduce significant synchronization problems, even when you are not using locks. When you assign a value to the variable each core will have to get access to the same place in system memory, or wait for the other core to finish before accessing it. You can avoid corruption without locks by using the lighter Interlocked.Add method to add a value to the sum atomically, at the OS level, but you will still get delays due to contention.
The proper way to do this is to update a thread local variable to create the partial sums and add all of them to a single global sum at the end. Parallel.For has an overload that does just this. MSDN even has an example using sumation at How To: Write a Parallel.For Loop that has Thread Local Variables
int[] nums = Enumerable.Range(0, 1000000).ToArray();
long total = 0;
// Use type parameter to make subtotal a long, not an int
Parallel.For(0, nums.Length, () => 0, (j, loop, subtotal) =>
{
subtotal += nums[j];
return subtotal;
},
(x) => Interlocked.Add(ref total, x)
);
Each thread updates its own subtotal value and updates the global total using Interlocked.Add when it finishes.