Is it possible to specify a custom thread pool for Java 8 parallel stream? I can not find it anywhere.
Imagine that I have a server application and I would like to
If you don't want to rely on implementation hacks, there's always a way to achieve the same by implementing custom collectors that will combine map
and collect
semantics... and you wouldn't be limited to ForkJoinPool:
list.stream()
.collect(parallelToList(i -> fetchFromDb(i), executor))
.join()
Luckily, it's done already here and available on Maven Central: http://github.com/pivovarit/parallel-collectors
Disclaimer: I wrote it and take responsibility for it.
Go to get AbacusUtil. Thread number can by specified for parallel stream. Here is the sample code:
LongStream.range(4, 1_000_000).parallel(threadNum)...
Disclosure: I'm the developer of AbacusUtil.
To measure the actual number of used threads, you can check Thread.activeCount()
:
Runnable r = () -> IntStream
.range(-42, +42)
.parallel()
.map(i -> Thread.activeCount())
.max()
.ifPresent(System.out::println);
ForkJoinPool.commonPool().submit(r).join();
new ForkJoinPool(42).submit(r).join();
This can produce on a 4-core CPU an output like:
5 // common pool
23 // custom pool
Without .parallel()
it gives:
3 // common pool
4 // custom pool
The original solution (setting the ForkJoinPool common parallelism property) no longer works. Looking at the links in the original answer, an update which breaks this has been back ported to Java 8. As mentioned in the linked threads, this solution was not guaranteed to work forever. Based on that, the solution is the forkjoinpool.submit with .get solution discussed in the accepted answer. I think the backport fixes the unreliability of this solution also.
ForkJoinPool fjpool = new ForkJoinPool(10);
System.out.println("stream.parallel");
IntStream range = IntStream.range(0, 20);
fjpool.submit(() -> range.parallel()
.forEach((int theInt) ->
{
try { Thread.sleep(100); } catch (Exception ignore) {}
System.out.println(Thread.currentThread().getName() + " -- " + theInt);
})).get();
System.out.println("list.parallelStream");
int [] array = IntStream.range(0, 20).toArray();
List<Integer> list = new ArrayList<>();
for (int theInt: array)
{
list.add(theInt);
}
fjpool.submit(() -> list.parallelStream()
.forEach((theInt) ->
{
try { Thread.sleep(100); } catch (Exception ignore) {}
System.out.println(Thread.currentThread().getName() + " -- " + theInt);
})).get();
If you don't need a custom ThreadPool but you rather want to limit the number of concurrent tasks, you can use:
List<Path> paths = List.of("/path/file1.csv", "/path/file2.csv", "/path/file3.csv").stream().map(e -> Paths.get(e)).collect(toList());
List<List<Path>> partitions = Lists.partition(paths, 4); // Guava method
partitions.forEach(group -> group.parallelStream().forEach(csvFilePath -> {
// do your processing
}));
(Duplicate question asking for this is locked, so please bear me here)
you can try implementing this ForkJoinWorkerThreadFactory and inject it to Fork-Join class.
public ForkJoinPool(int parallelism,
ForkJoinWorkerThreadFactory factory,
UncaughtExceptionHandler handler,
boolean asyncMode) {
this(checkParallelism(parallelism),
checkFactory(factory),
handler,
asyncMode ? FIFO_QUEUE : LIFO_QUEUE,
"ForkJoinPool-" + nextPoolId() + "-worker-");
checkPermission();
}
you can use this constructor of Fork-Join pool to do this.
notes:-- 1. if you use this, take into consideration that based on your implementation of new threads, scheduling from JVM will be affected, which generally schedules fork-join threads to different cores(treated as a computational thread). 2. task scheduling by fork-join to threads won't get affected. 3. Haven't really figured out how parallel stream is picking threads from fork-join(couldn't find proper documentation on it), so try using a different threadNaming factory so as to make sure, if threads in parallel stream are being picked from customThreadFactory that you provide. 4. commonThreadPool won't use this customThreadFactory.