How do I know if Fork and Join has enough pool size in Java?

问题

I am trying to implement a divide-and-conquer solution to some large data. I use fork and join to break down things into threads. However I have a question regarding the fork mechanism: if I set my divide and conquer condition as:

@Override
protected SomeClass compute(){
    if (list.size()<LIMIT){
        //Do something here
        ...
    }else{
        //Divide the list and invoke sub-threads
        SomeRecursiveTaskClass subWorker1 = new SomeRecursiveTaskClass(list.subList());
        SomeRecursiveTaskClass subWorker2 = new SomeRecursiveTaskClass(list.subList());
        invokeAll(subWorker1, subWorker2);
        ...
    }
}

What will happen if there is not enough resource to invoke subWorker (e.g. not enough thread in pool)? Does Fork/Join framework maintains a pool size for available threads? Or should I add this condition into my divide-and-conquer logic?

回答1:

Each ForkJoinPool has a configured target parallelism. This isn’t exactly matching the number of threads, i.e. if a worker thread is going to wait via a ManagedBlocker, the pool may start even more threads to compensate. The parallelism of the commonPool defaults to “number of CPU cores minus one”, so when incorporating the initiating non-pool thread as helper, the resulting parallelism will utilize all CPU cores.

When you submit more jobs than threads, they will be enqueued. Enqueuing a few jobs can help utilizing the threads, as not all jobs may run exactly the same time, so threads running out of work may steal jobs from other threads, but splitting the work too much may create an unnecessary overhead.

Therefore, you may use ForkJoinTask.getSurplusQueuedTaskCount() to get the current number of pending jobs that are unlikely to be stolen by other threads and split only when it is below a small threshold. As its documentation states:

This value may be useful for heuristic decisions about whether to fork other tasks. In many usages of ForkJoinTasks, at steady state, each worker should aim to maintain a small constant surplus (for example, 3) of tasks, and to process computations locally if this threshold is exceeded.

So this is the condition to decide whether to split your jobs further. Since this number reflects when idle threads steal your created jobs, it will cause balancing when the jobs have different CPU load. Also, it works the other way round, if the pool is shared (like the common pool) and threads are already busy, they will not pick up your jobs, the surplus count will stay high and you will automatically stop splitting then.

来源：https://stackoverflow.com/questions/48157596/how-do-i-know-if-fork-and-join-has-enough-pool-size-in-java

标签

java

multithreading

fork-join

forkjoinpool