问题
I have a List of 50-500 IO bound tasks, each of which may run from about 3 mins to 20 mins. The list is calculated before and no new tasks are added recursively. I want to run it all with fixed pool size(say 4). Tasks are not dependent on other tasks.
So I guess this fits ThreadPoolTaskSupport usecase better than ForkJoinTaskSupport. But ThreadPoolTaskSupport is deprecated, and ForkJoinTaskSupport is only recommended alternative.
I tried using ForkJoinTaskSupport, but it seems when last 8 or so tasks are left it starts killing threads, so last 3-4 tasks run on single thread, adding 1 hour to total run time.
Any way to fix this behavior in ForkJoinTaskSupport or should I use ThreadPoolTaskSupport with fixed size despite it being deprecated ?
Code to test this.
object ThreadLibTest {
def main(arr:Array[String]):Unit = {
val tasks = ListBuffer.empty[() => Unit]
for (i <- 1 to 65) {
tasks += {() => {
Thread.sleep(1000)
// println("finishing " + i + " id " + Thread.currentThread().getName)
}
}
}
val ptasks = tasks.par
val fjp = new ForkJoinPool(6)
ptasks.tasksupport = new ForkJoinTaskSupport(fjp)
ptasks.map(x => {logfjp(fjp); x.apply()})
}
def logfjp(pool: ForkJoinPool) {
println(
" activeThreads=" + pool.getActiveThreadCount() +
" runningThreads=" + pool.getRunningThreadCount() +
" poolSize=" + pool.getPoolSize() +
" queuedTasks=" + pool.getQueuedTaskCount() +
" queuedSubmissions=" + pool.getQueuedSubmissionCount() +
" parallelism=" + pool.getParallelism() +
" stealCount=" + pool.getStealCount());
}
}
Last few outputs.
activeThreads=6 runningThreads=2 poolSize=6 queuedTasks=2 queuedSubmissions=0 parallelism=6 stealCount=0
activeThreads=6 runningThreads=2 poolSize=6 queuedTasks=2 queuedSubmissions=0 parallelism=6 stealCount=0
activeThreads=6 runningThreads=2 poolSize=6 queuedTasks=2 queuedSubmissions=0 parallelism=6 stealCount=0
activeThreads=6 runningThreads=1 poolSize=6 queuedTasks=2 queuedSubmissions=0 parallelism=6 stealCount=0
activeThreads=6 runningThreads=2 poolSize=6 queuedTasks=1 queuedSubmissions=0 parallelism=6 stealCount=0
activeThreads=4 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=0
activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
activeThreads=1 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
EDIT: Used Fixed sized ThreadPoolExecutor as taskSupport, same issue again, only 2 threads out of 6 were executing last several tasks.
Explicitly submitting tasks to ThreadPoolExecutor yielded correct result. All threads were active till all tasks were finished. Yet to figure out why setting in taskSupport does not work.
Explicitly submitting tasks to Fork join pool or Thread Pool Executor finishes in 11 secs, while tasks.par with either one takes 17 secs.
Maybe due shouldSplitFurther
defined in IterableSplitter
?
来源:https://stackoverflow.com/questions/55703332/using-threadpooltasksupport-as-tasksupport-for-parallel-collections-in-scala