Why the below code doesn\'t print any output whereas if we remove parallel, it prints 0, 1?
IntStream.iterate(0, i -> ( i + 1 ) % 2)
.parallel()
Stream.iterate returns 'an infinite sequential ordered Stream'. Therefore, making a sequential stream parallel is not too useful.
According to the description of the Stream package:
For parallel streams, relaxing the ordering constraint can sometimes enable more efficient execution. Certain aggregate operations, such as filtering duplicates (distinct()) or grouped reductions (Collectors.groupingBy()) can be implemented more efficiently if ordering of elements is not relevant. Similarly, operations that are intrinsically tied to encounter order, such as limit(), may require buffering to ensure proper ordering, undermining the benefit of parallelism. In cases where the stream has an encounter order, but the user does not particularly care about that encounter order, explicitly de-ordering the stream with unordered() may improve parallel performance for some stateful or terminal operations. However, most stream pipelines, such as the "sum of weight of blocks" example above, still parallelize efficiently even under ordering constraints.
this seems to be the case in your case, using unordered(), it prints 0,1.
IntStream.iterate(0, i -> (i + 1) % 2)
.parallel()
.unordered()
.distinct()
.limit(10)
.forEach(System.out::println);
This code has a major problem, even without the parallel: After .distinct(), the stream will only have 2 elements- so the limit never kicks in- it will print the two elements and then continue wasting your CPU time indefinitely. This might have been what you intended though.
With parallel and limit, I believe the problem is exacerbated due to the way the work is divided. I haven't traced all the way through the parallel stream code, but here is my guess:
The parallel code divides the work between multiple threads, which all chug along indefinitely as they never fill their quota. The system probably waits for each thread to finish so it can combine their results to ensure distinctness in order- but this will never happen in the case you provide.
Without the order requirement, results from each worker thread can be immediately uses after being checked against a global distinctness set.
Without limit, I suspect that different code is used to handle infinite streams: instead of waiting for the required 10 to fill up, results are reported as discovered. Its a little like making an iterator that reports hasNext() = true, first produces 0, then 1, then the next() call hangs forever without producing a result- in the parallel case something is waiting for several reports so it can properly combine/order them before outputting, while in the serial case it does what it can then hangs.
Ill try to find the exact difference in call stack with and without distinct() or limit(), but so far it seems very difficult to navigate the rather complex stream library calling sequence.
I know the code is not correct and as suggested in the solution too, if we move the limit before distinct, we won't have infinite loop.
Parallel function is using fork and join concept to allocate the work, which allocates all the available thread for the work, rather than single thread.
We are rightly expecting infinite loop, as multiple thread is infinitely processing a data and nothing stopping them, as the limit of 10 is never hitting after distinct.
It might be possible that it keeps trying to fork and never tries to join to move it forward. But still I believe its a defect in the java with more than anything else.
The real cause is that ordered parallel .distinct()
is the full barrier operation as described in documentation:
Preserving stability for
distinct()
in parallel pipelines is relatively expensive (requires that the operation act as a full barrier, with substantial buffering overhead), and stability is often not needed.
The "full barrier operation" means that all the upstream operations must be performed before the downstream can start. There are only two full barrier operations in Stream API: .sorted()
(every time) and .distinct()
(in ordered parallel case). As you have non short-circuit infinite stream supplied to the .distinct()
you end up with infinite loop. By contract .distinct()
cannot just emit elements to the downstream in any order: it should always emit the first repeating element. While it's theoretically possible to implement parallel ordered .distinct()
better, it would be much more complex implementation.
As for solution, @user140547 is right: add .unordered()
before .distinct()
this switches distinct()
algorithm to unordered one (which just uses shared ConcurrentHashMap
to store all the observed elements and emits every new element to the downstream). Note that adding .unordered()
after .distinct()
will not help.