Nested parallel streams in Java

蹲街弑〆低调 提交于 2021-01-24 08:44:27

问题


I want to understand the ordering constraints between nested streams in Java.

Example 1:

public static void main(String[] args) {
    IntStream.range(0, 10).forEach(i -> {
        System.out.println(i);
        IntStream.range(0, 10).forEach(j -> {
            System.out.println("    " + i + " " + j);
        });
    });
}

This code executes deterministically, so the inner loop runs forEach on each j before the outer loop runs its own forEach on the next i:

0
    0 0
    0 1
    0 2
    0 3
    0 4
    0 5
    0 6
    0 7
    0 8
    0 9
1
    1 0
    1 1
    1 2
    1 3
    1 4
    1 5
    1 6
    1 7
    1 8
    1 9
2
    2 0
    2 1
    2 2
    2 3
...

Example 2:

public static void main(String[] args) {
    IntStream.range(0, 10).parallel().forEach(i -> {
        System.out.println(i);
        IntStream.range(0, 10).parallel().forEach(j -> {
            System.out.println("    " + i + " " + j);
        });
    });
}

If the streams are made parallel() as in this second example, I could imagine the inner workers blocking as they wait for threads to become available in the outer work queue, since the outer work queue threads have to block on the completion of the inner stream, and the default thread pool only has a limited number of threads. However, deadlock does not appear to occur:

6
5
8
    8 6
0
1
    6 2
7
    1 6
    8 5
    7 6
    8 8
2
    0 6
    0 2
    0 8
    5 2
    5 4
    5 6
    0 5
    2 6
    7 2
    7 5
    7 8
    6 4
    8 9
    1 5
 ...

Both streams share the same default thread pool, yet they generate different work units. Each outer work unit can only complete after all inner units for that outer work unit have completed, since there is a completion barrier at the end of each parallel stream.

How is the coordination between these inner and outer streams managed across the shared pool of worker threads, without any sort of deadlock?


回答1:


The thread pool behind parallel streams is the common pool, which you can get with ForkJoinPool.commonPool(). It usually uses NumberOfProcessors - 1 workers. To resolve dependencies like you've described, it's able to dynamically create additional workers if (some) current workers are blocked and a deadlock becomes possible.

However, this is not the answer for your case.

Tasks in a ForkJoinPool have two important functionalities:

  • They can create subtasks and split the current task into smaller pieces (fork).
  • They can wait for the subtasks (join).

When a thread executes such a task A and joins a subtask B, it doesn't just wait blocking for the subtask to finish its execution but executes another task C in the meantime. When C is finished, the thread comes back to A and checks if B is finished. Note that B and C can (and most likely are) the same task. If B is finished, then A has successfully waited for/joined it (non-blocking!). Check out this guide if the previous explanation is not clear.

Now when you use a parallel stream, the range of the stream is split into tasks recursively until the tasks become so small that they can be executed sequentially more efficiently. Those tasks are put into a work queue (there is one for each worker) in the common pool. So, what IntStream.range(0, 100).parallel().forEach does is splitting up the range recursively until it's not worth it anymore. Each final task, or rather bunch of iterations, can be executed sequentially with the provided code in forEach. At this point the workers in the common pool can just execute those tasks until all are done and the stream can return. Note that the calling thread helps out with the execution by joining subtasks!

Now each of those tasks uses a parallel stream itself in your case. The procedure is the same; split it up into smaller tasks and put those tasks into a work queue in the common pool. From the ForkJoinPool's perspective those are just additional tasks on top of the already present ones. The workers just keep executing/joining tasks until all are done and the outer stream can return.

This is what you see in the output: There is no deterministic behaviour, no fixed order. Also there cannot occur a deadlock because in the given use case there won't be blocking threads.

You can check the explanation with the following code:

    public static void main(String[] args) {
        IntStream.range(0, 10).parallel().forEach(i -> {
            IntStream.range(0, 10).parallel().forEach(j -> {
                for (int x = 0; x < 1e6; x++) { Math.sqrt(Math.log(x)); }
                System.out.printf("%d %d %s\n", i, j, Thread.currentThread().getName());
                for (int x = 0; x < 1e6; x++) { Math.sqrt(Math.log(x)); }
            });
        });
    }

You should notice that the main thread is involved in the execution of the inner iterations, so it is not (!) blocked. The common pool workers just pick tasks one after another until all are finished.



来源:https://stackoverflow.com/questions/62670334/nested-parallel-streams-in-java

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!