问题
I have the following loop that runs a conversion from base95 to base10. I'm working with several thousand digit numbers, so BigIntegers are required. inst
is the base95 string.
Parallel.For(0, inst.Length, x =>
{
result += BigInteger.Pow(95, x) * (inst[x] - 32);
});
If I work with base95 strings of about 200 characters or less, it works perfectly fine and outputs the same result that a normal for
loop would.
However, once I start increasing the size of the base95 string, the parallel's output is thrown way off. I need to work with base95 strings with 1500+ characters, and even up to 30000. A regular for
loop can calculate the result fine.
What could be causing this problem? Is there a better method to this than a Parallel.For
loop that's still faster than a for
loop?
回答1:
Its just not thread safe. As to why its not corrupting with smaller strings, i'm not sure. Possibly TPL just doesn't think the workload is worthy of extra threads. Though, i did verified your results, it does produce inconsistent results with larger strings.
The only solution will be to make it thread safe. A cheap and nasty approach will be to use lock
... It would be better if you could user another thread safe approach like Interlocked, however, it wont work with BigInteger
.
BigInteger result = 0;
object sync = new object();
Parallel.For(
0,
inst.Length,
x =>
{
var temp = BigInteger.Pow(95, x) * (inst[x] - 32);
lock (sync)
result += temp;
});
Its not perfect with all the locking but its still faster than a regular for
loop on my pc
Another approach would be to use the for overloads, this way you are only locking once per each thread.
Parallel.For(
0,
inst.Length,
() => new BigInteger(0),
(x, state, subTotal) => subTotal + BigInteger.Pow(95, x) * (inst[x] - 32),
integer =>
{
lock (sync)
result += integer;
});
Benchmarks
So i was bored, here is your bench marks
Tests were run 50 times each, GC.Collect
and GC.WaitForPendingFinalizers
is run before every test to give cleaner results. All results were tested against each other to prove they are accurate. Scale
represents the size of the string as per your question
Setup
----------------------------------------------------------------------------
Mode : Release (64Bit)
Test Framework : .NET Framework 4.7.1 (CLR 4.0.30319.42000)
----------------------------------------------------------------------------
Operating System : Microsoft Windows 10 Pro
Version : 10.0.17134
----------------------------------------------------------------------------
CPU Name : Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
Description : Intel64 Family 6 Model 58 Stepping 9
Cores (Threads) : 4 (8) : Architecture : x64
Clock Speed : 3901 MHz : Bus Speed : 100 MHz
L2Cache : 1 MB : L3Cache : 8 MB
----------------------------------------------------------------------------
Results
--- Random characters -----------------------------------------------------------------
| Value | Average | Fastest | Cycles | Garbage | Test | Gain |
--- Scale 10 ----------------------------------------------------------- Time 0.259 ---
| for | 5.442 µs | 4.968 µs | 21.794 K | 0.000 B | Base | 0.00 % |
| ParallelResult | 32.451 µs | 30.397 µs | 116.808 K | 0.000 B | Pass | -496.25 % |
| ParallelLock | 35.551 µs | 32.443 µs | 127.966 K | 0.000 B | Pass | -553.22 % |
| AsParallel | 141.457 µs | 118.959 µs | 398.676 K | 0.000 B | Pass | -2,499.13 % |
--- Scale 100 ---------------------------------------------------------- Time 0.298 ---
| ParallelResult | 93.261 µs | 80.085 µs | 329.450 K | 0.000 B | Pass | 11.36 % |
| ParallelLock | 103.912 µs | 84.470 µs | 366.599 K | 0.000 B | Pass | 1.23 % |
| for | 105.210 µs | 93.823 µs | 371.025 K | 0.000 B | Base | 0.00 % |
| AsParallel | 183.538 µs | 159.002 µs | 488.534 K | 0.000 B | Pass | -74.45 % |
--- Scale 1,000 -------------------------------------------------------- Time 4.191 ---
| AsParallel | 5.701 ms | 4.932 ms | 15.479 M | 0.000 B | Pass | 65.83 % |
| ParallelResult | 6.510 ms | 5.701 ms | 18.166 M | 0.000 B | Pass | 60.98 % |
| ParallelLock | 6.734 ms | 5.303 ms | 17.314 M | 0.000 B | Pass | 59.64 % |
| for | 16.685 ms | 15.640 ms | 58.183 M | 0.000 B | Base | 0.00 % |
--- Scale 10,000 ------------------------------------------------------ Time 34.805 ---
| AsParallel | 6.205 s | 4.767 s | 19.202 B | 0.000 B | Pass | 47.20 % |
| ParallelResult | 6.286 s | 5.891 s | 14.752 B | 0.000 B | Pass | 46.51 % |
| ParallelLock | 6.290 s | 5.202 s | 9.982 B | 0.000 B | Pass | 46.48 % |
| for | 11.752 s | 11.436 s | 41.136 B | 0.000 B | Base | 0.00 % |
---------------------------------------------------------------------------------------
ParallelLock
[Test("ParallelLock", "", true)]
public BigInteger Test1(string input, int scale)
{
BigInteger result = 0;
object sync = new object();
Parallel.For(
0,
input.Length,
x =>
{
var temp = BigInteger.Pow(95, x) * (input[x] - 32);
lock (sync)
result += temp;
});
return result;
}
ParallelResult
[Test("ParallelResult", "", false)]
public BigInteger Test2(string input, int scale)
{
BigInteger result = 0;
object sync = new object();
Parallel.For(
0,
input.Length,
() => new BigInteger(0),
(x, state, subTotal) => subTotal + BigInteger.Pow(95, x) * (input[x] - 32),
integer =>
{
lock (sync)
result += integer;
});
return result;
}
AsParallel as tendered by gdir
[Test("AsParallel", "", false)]
public BigInteger Test4(string input, int scale)
{
return Enumerable.Range(0, input.Length)
.AsParallel()
.Aggregate(
new BigInteger(0),
(subtotal, x) => subtotal + BigInteger.Pow(95, x) * (input[x] - 32),
(total, thisThread) => total + thisThread,
(finalSum) => finalSum);;
}
for
[Test("for", "", false)]
public BigInteger Test3(string input, int scale)
{
BigInteger result = 0;
for (int i = 0; i < input.Length; i++)
{
result += BigInteger.Pow(95, i) * (input[i] - 32);
}
return result;
}
Input
public static string StringOfChar(int scale)
{
var list = Enumerable.Range(1, scale)
.Select(x => (char)(_rand.Next(32)+32))
.ToArray();
return string.Join("", list);
}
Validation
private static bool Validation(BigInteger result, BigInteger baseLine)
{
return result == baseLine;
}
Summary
Parallel will give you a performance boost, the less you can lock the better it is in theory, however there maybe many factors of why the results played out the way they did. its seems the result overload seems to work well, but is pretty similar with the larger workloads, i'm not really sure why. Note i didn't play with the parallel options, and you might be able to tweak it a little bit more for your solution
anyway, good luck
来源:https://stackoverflow.com/questions/52215026/parallel-for-with-biginteger-calculations-output-different-than-for-loop