问题
Recently, I was running some scalability experiments using Java Fork-Join. Here, I used the non-default ForkJoinPool constructor ForkJoinPool(int parallelism)
, passing the desired parallelism (# workers) as constructor argument.
Specifically, using the following piece of code:
public static void main(String[] args) throws InterruptedException {
ForkJoinPool pool = new ForkJoinPool(Integer.parseInt(args[0]));
pool.invoke(new ParallelLoopTask());
}
static class ParallelLoopTask extends RecursiveAction {
final int n = 1000;
@Override
protected void compute() {
RecursiveAction[] T = new RecursiveAction[n];
for(int p = 0; p < n; p++){
T[p] = new DummyTask();
T[p].fork();
}
for(int p = 0; p < n; p++){
T[p].join();
}
/*
//The problem does not occur when tasks are joined in the reverse order, i.e.
for(int p = n-1; p >= 0; p--){
T[p].join();
}
*/
}
}
static public class DummyTask extends RecursiveAction {
//performs some dummy work
final int N = 10000000;
//avoid memory bus contention by restricting access to cache (which is distributed)
double val = 1;
@Override
protected void compute() {
for(int j = 0; j < N; j++){
if(val < 11){
val *= 1.1;
}else{
val = 1;
}
}
}
}
I got these results on a processor with 4 physical and 8 logical cores (Using java 8: jre1.8.0_45):
T1: 11730
T2: 2381 (speedup: 4,93)
T4: 2463 (speedup: 4,76)
T8: 2418 (speedup: 4,85)
While when using java 7 (jre1.7.0), I get
T1: 11938
T2: 11843 (speedup: 1,01)
T4: 5133 (speedup: 2,33)
T8: 2607 (speedup: 4,58)
(where TP is the execution time in ms, using parallelism level P)
While both results surprise me, the latter I can understand (the join will cause 1 worker (executing the loop) to block, as it fails to recognize that it could, while waiting, process other pending dummy tasks from its local queue). The former, however, got me puzzled.
BTW: When counting the number of started, but not yet completed dummy tasks, I found that up to 24 such tasks existed in a pool with parallelism 2 at some point in time...?
EDIT:
I benchmarked the application above using JMH (jdk1.8.0_45) (options -bm avgt -f 1) (= 1 fork, 20+20 iterations) The results below
T1: 11,664
11,664 ±(99.9%) 0,044 s/op [Average]
(min, avg, max) = (11,597, 11,664, 11,810), stdev = 0,050
CI (99.9%): [11,620, 11,708] (assumes normal distribution)
T2: 4,134 (speedup: 2,82)
4,134 ±(99.9%) 0,787 s/op [Average]
(min, avg, max) = (3,045, 4,134, 5,376), stdev = 0,906
CI (99.9%): [3,348, 4,921] (assumes normal distribution)
T4: 2,972 (speedup: 3,92)
2,972 ±(99.9%) 0,212 s/op [Average]
(min, avg, max) = (2,375, 2,972, 3,200), stdev = 0,245
CI (99.9%): [2,759, 3,184] (assumes normal distribution)
T8: 2,845 (speedup: 4,10)
2,845 ±(99.9%) 0,306 s/op [Average]
(min, avg, max) = (2,277, 2,845, 3,310), stdev = 0,352
CI (99.9%): [2,540, 3,151] (assumes normal distribution)
At first sight one would think these scalability results are closer to what one would expect i.e. T1 < T2 < T4 ~ T8. However, what still bugs me is the following:
- The difference for T2 between java 7 and 8. I guess one explanation would be that the worker executing the parallel loop does not go idle in java 8, but instead finds other work to perform.
- The super-linear speedup (3x) with 2 workers. Also, note that T2 seems to increase with every iteration (see below, note that this is also the case, although to a smaller extent with P=4,8). The times in the first iterations of the warmup are similar to those mentioned above. Maybe the warmup period should be longer, but still, isn't it strange that execution time increases, i.e. I'd rather expect it to decrease?
- Finally, I still find the observation that there are a lot more started & not completed dummy tasks than worker threads curious.
>
Run progress: 0,00% complete, ETA 00:00:40
Fork: 1 of 1
Warmup Iteration 1: 2,365 s/op
Warmup Iteration 2: 2,341 s/op
Warmup Iteration 3: 2,393 s/op
Warmup Iteration 4: 2,323 s/op
Warmup Iteration 5: 2,925 s/op
Warmup Iteration 6: 3,040 s/op
Warmup Iteration 7: 2,304 s/op
Warmup Iteration 8: 2,347 s/op
Warmup Iteration 9: 2,939 s/op
Warmup Iteration 10: 3,083 s/op
Warmup Iteration 11: 3,004 s/op
Warmup Iteration 12: 2,327 s/op
Warmup Iteration 13: 3,083 s/op
Warmup Iteration 14: 3,229 s/op
Warmup Iteration 15: 3,076 s/op
Warmup Iteration 16: 2,325 s/op
Warmup Iteration 17: 2,993 s/op
Warmup Iteration 18: 3,112 s/op
Warmup Iteration 19: 3,074 s/op
Warmup Iteration 20: 2,354 s/op
Iteration 1: 3,045 s/op
Iteration 2: 3,094 s/op
Iteration 3: 3,113 s/op
Iteration 4: 3,057 s/op
Iteration 5: 3,050 s/op
Iteration 6: 3,106 s/op
Iteration 7: 3,080 s/op
Iteration 8: 3,370 s/op
Iteration 9: 4,482 s/op
Iteration 10: 4,325 s/op
Iteration 11: 5,002 s/op
Iteration 12: 4,980 s/op
Iteration 13: 5,121 s/op
Iteration 14: 4,310 s/op
Iteration 15: 5,146 s/op
Iteration 16: 5,376 s/op
Iteration 17: 4,810 s/op
Iteration 18: 4,320 s/op
Iteration 19: 5,249 s/op
Iteration 20: 4,654 s/op
回答1:
There is nothing in your example of how you did this benchmark. It looks like you just did a milli-time at the beginning and end of the run. This is not accurate. I suggest you take a look at this SO answer and re-post your timings. BTW the jmh benchmark is going to be the standard in Java9 so that is what you should be using.
EDIT:
You admit that the scalability results are what you expected. But you say you’re still not happy with the results. Now it’s time to look inside the code.
There are serious problems with this framework. I’ve been writing a critique about it since 2010. As I point out here , join doesn’t work. The author has tried various means to get around the problem but the problem persists.
Increase your run time to about a minute, (n=100000000) or put some heavy computations in the compute(). Now profile the application in VisualVM or another profiler. This will show you the stalling threads, excessive threads, etc.
If that doesn’t help answer your questions than you should look at the code flow using a debugger. Profiling/code analysis is the only way you are going to get satisfactory answers to your questions.
来源:https://stackoverflow.com/questions/31267341/unexpected-scalability-results-in-java-fork-join-java-8