The default \"paralellStream()\" in Java 8 uses the common ForkJoinPool
which may be a latency problem if the common Pool threads are exhausted when a task is submi
In short, yes, there are some problems with your solution. It definitely improves using blocking code inside parallel stream, and some third-party libraries provide similar solution (see, for example, Blocking class in jOOλ library). However this solution does not change the internal splitting strategy used in Stream API. The number of subtasks created by Stream API is controlled by the predefined constant in AbstractTask
class:
/**
* Default target factor of leaf tasks for parallel decomposition.
* To allow load balancing, we over-partition, currently to approximately
* four tasks per processor, which enables others to help out
* if leaf tasks are uneven or some processors are otherwise busy.
*/
static final int LEAF_TARGET = ForkJoinPool.getCommonPoolParallelism() << 2;
As you can see it's four times bigger than common pool parallelism (which is by default number of CPU cores). The real splitting algorithm is a little bit more tricky, but roughly you cannot have more than 4x-8x tasks even if all of them are blocking.
For example, if you have 8 CPU cores, your Thread.sleep()
test will work nicely up to IntStream.range(0, 32)
(as 32 = 8*4). However for IntStream.range(0, 64)
you will have 32 parallel tasks each processing two input numbers, so the whole processing would take 20 seconds, not 10.