问题
After looking the Fork/Join Tutorial, I created a class for computing large factorials:
public class ForkFactorial extends RecursiveTask<BigInteger> {
final int end;
final int start;
private static final int THRESHOLD = 10;
public ForkFactorial(int n) {
this(1, n + 1);
}
private ForkFactorial(int start, int end) {
this.start = start;
this.end = end;
}
@Override
protected BigInteger compute() {
if (end - start < THRESHOLD) {
return computeDirectly();
} else {
int mid = (start + end) / 2;
ForkFactorial lower = new ForkFactorial(start, mid);
lower.fork();
ForkFactorial upper = new ForkFactorial(mid, end);
BigInteger upperVal = upper.compute();
return lower.join().multiply(upperVal);
}
}
private BigInteger computeDirectly() {
BigInteger val = BigInteger.ONE;
BigInteger mult = BigInteger.valueOf(start);
for (int iter = start; iter < end; iter++, mult = mult.add(BigInteger.ONE)) {
val = val.multiply(mult);
}
return val;
}
}
The question I have is how to determine the threshold for which I subdivide the task? I found a page on fork/join parallelism which states:
One of the main things to consider when implementing an algorithm using fork/join parallelism is chosing the threshold which determines whether a task will execute a sequential computation rather than forking parallel sub-tasks.
If the threshold is too large, then the program might not create enough tasks to fully take advantage of the available processors/cores.
If the threshold is too small, then the overhead of task creation and management could become significant.
In general, some experimentation will be necessary to find an appropriate threshold value.
So what experimentation would I need to do in order to determine the threshold?
回答1:
PigeonHole estimation: Set an arbitrary Threshold, calculate the computation time. and based on it increase and decrease the threshold to see if your computation time improves, till the time you see no improvement by lowering the threshold.
回答2:
Choosing a threshold depends on many factors:
The actual computation should take a reasonable amount of time. If you're summing an array and the array is small then it is probably better to do it sequentially. If the array length is 16M, then splitting it into smaller pieces and parallel processing should be worthwhile. Try it and see.
The number of processors should be sufficient. Doug Lea once documented his framework with the number 16+ processors to make it worthwhile. Even splitting an array in half and running on two threads will produce about a 1.3% gain in throughput. Now you have to consider the split/join overhead. Try running on many configurations to see what you get.
The number of concurrent requests should be small. If you have N processors and 8(N) concurrent requests, then using one thread per request is often more efficient for throughput. The logic here is simple. If you have N processors available and you split your work accordingly but there are hundreds of other tasks ahead of you, then what's the point of splitting?
This is what experimenting means.
Unfortunately, this framework doesn't come with the means for accountability. There is no way to see the load on each thread. The high water mark in deques. Total requests processed. Errors encountered, etc.
Good luck.
回答3:
Note that arithmetic is not constant time with BigInteger, it is proportional to the length of the inputs. The actual complexity of each operation is not readily at hand, though the futureboy implementation referenced in that Q/A section does document what it (expects) to achieve under different circumstances.
Getting the work estimating function correct is important both when it comes to deciding how to partition the problem into smaller chunks and for determining whether or not a particular chunk is worth dividing again.
When using experimentation to determine your threshold, you need to take care that you do not just benchmark one corner of the problem space.
回答4:
As I understand, this experiment is an optimization, so it should be applied only when there is a need.
You could experiment on different split strategies - i.e. one can split by two equal parts or by estimated multiplication cost which depends on the integer decimal length.
For each of the strategies you could test as many threshold values as possible for get the full picture of your strategies. If you are limited in CPU resource, than you could test i.e. each 5th or 10th. So, from my experience the first important thing here is to get the full picture of how your algorithm performs.
来源:https://stackoverflow.com/questions/20177364/how-to-determine-the-proper-work-division-threshold-of-a-fork-join-task