Converting from a ForEach loop to a Parallel.ForEach loop when summarizing into a double slows things down

问题

I have a section of C# code as follows. This code summarizes a column of 'doubles' in a DataTable :

var data = this.Db.ExecuteRead(query, this.Score.Name);
var time = 0.0;
foreach (DataRow row in data.Rows)
{
    time += this.ParseDouble(row[0].ToString()) / MillisecondsPerMinute;
}

This code takes 4 seconds to execute. I wanted to speed it up, so I parallelized it as follows:

Parallel.ForEach(
                data.AsEnumerable(),
                row =>
                    {
                        time += this.ParseDouble(row[0].ToString()) / MillisecondsPerMinute;
                    });

This code takes 3 seconds to execute. It also causes collisions. I don't think a 'double' thread safe. This was expected. I then added a Mutex to make it thread safe:

Parallel.ForEach(
                data.AsEnumerable(),
                row =>
                    {
                        mut.WaitOne();
                        ptime += this.ParseDouble(row[0].ToString()) / MillisecondsPerMinute;
                        mut.ReleaseMutex();
                    });

This code is much slower. It takes 15 seconds to execute but produces accurate results. My question is, am I better off staying with the standard 'ForEach' here, or can I implement the multithreading in a better way?

For reference, here is the ParseDouble method:

protected double ParseDouble(string text)
{
    double value;
    if (!double.TryParse(text, out value))
    {
        throw new DoubleExpectedException();
    }

    return value;
}

回答1:

Here are some approaches. First a simple Parallel.ForEach, reducing the protected region (lock) to the absolute minimum required (the updating of the shared state). This should minimize the contention for the lock.

DataTable data = this.Db.ExecuteRead(query, this.Score.Name);
double totalTime = 0.0;
Parallel.ForEach(data.AsEnumerable(), row =>
{
    double time = Double.Parse(row[0].ToString()) / MillisecondsPerMinute;
    lock (data) { totalTime += time; }
});

A PLINQ approach. Easy and secure, but probably not the most efficient:

double totalTime = data
    .AsEnumerable()
    .AsParallel()
    .Select(row => Double.Parse(row[0].ToString()) / MillisecondsPerMinute)
    .Sum();

The combination of Parallel.ForEach and Partitioner.Create should give the best performance, because it allows to chunkify the workload:

double totalTime = 0.0;
Parallel.ForEach(Partitioner.Create(0, data.Rows.Count), () => 0.0D,
    (range, state, accumulator) =>
{
    for (int i = range.Item1; i < range.Item2; i++)
    {
        DataRow row = data.Rows[i];
        accumulator += Double.Parse(row[0].ToString()) / MillisecondsPerMinute;
    }
    return accumulator;
}, accumulator =>
{
    lock (data) { totalTime += accumulator; }
});

回答2:

It's not always preferable. For fast loop bodies, Parallel.ForEach can degrade performance. One more thing is that if your iteration items are not dependent on your previous item then go ahead use it but if they are dependent on each other then I would suggest using Regular foreach.

来源：https://stackoverflow.com/questions/65204040/converting-from-a-foreach-loop-to-a-parallel-foreach-loop-when-summarizing-into

标签

thread-safety

parallel.foreach