Using ThreadPoolTaskSupport as tasksupport for parallel collections in scala

只愿长相守 提交于 2019-12-25 04:21:52

问题


I have a List of 50-500 IO bound tasks, each of which may run from about 3 mins to 20 mins. The list is calculated before and no new tasks are added recursively. I want to run it all with fixed pool size(say 4). Tasks are not dependent on other tasks.

So I guess this fits ThreadPoolTaskSupport usecase better than ForkJoinTaskSupport. But ThreadPoolTaskSupport is deprecated, and ForkJoinTaskSupport is only recommended alternative.

I tried using ForkJoinTaskSupport, but it seems when last 8 or so tasks are left it starts killing threads, so last 3-4 tasks run on single thread, adding 1 hour to total run time.

Any way to fix this behavior in ForkJoinTaskSupport or should I use ThreadPoolTaskSupport with fixed size despite it being deprecated ?

Code to test this.

object ThreadLibTest  {
  def main(arr:Array[String]):Unit = {
    val tasks = ListBuffer.empty[() => Unit]
    for (i <- 1 to 65) {
      tasks += {() => {
          Thread.sleep(1000)
//          println("finishing " + i + " id " + Thread.currentThread().getName)
      }
      }
    }

    val ptasks = tasks.par
    val fjp = new ForkJoinPool(6)
    ptasks.tasksupport = new ForkJoinTaskSupport(fjp)
    ptasks.map(x => {logfjp(fjp); x.apply()})

  }

  def logfjp(pool: ForkJoinPool) {
    println(
                " activeThreads=" + pool.getActiveThreadCount() +
                " runningThreads=" + pool.getRunningThreadCount() +
                " poolSize=" + pool.getPoolSize() +
                " queuedTasks=" + pool.getQueuedTaskCount() +
                " queuedSubmissions=" + pool.getQueuedSubmissionCount() +
                " parallelism=" + pool.getParallelism() +
                " stealCount=" + pool.getStealCount());    
  }

}
Last few outputs.
 activeThreads=6 runningThreads=2 poolSize=6 queuedTasks=2 queuedSubmissions=0 parallelism=6 stealCount=0
 activeThreads=6 runningThreads=2 poolSize=6 queuedTasks=2 queuedSubmissions=0 parallelism=6 stealCount=0
 activeThreads=6 runningThreads=2 poolSize=6 queuedTasks=2 queuedSubmissions=0 parallelism=6 stealCount=0
 activeThreads=6 runningThreads=1 poolSize=6 queuedTasks=2 queuedSubmissions=0 parallelism=6 stealCount=0
 activeThreads=6 runningThreads=2 poolSize=6 queuedTasks=1 queuedSubmissions=0 parallelism=6 stealCount=0
 activeThreads=4 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=0
 activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
 activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
 activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
 activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
 activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
 activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
 activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
 activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
 activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
 activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
 activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
 activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
 activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
 activeThreads=2 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3
 activeThreads=1 runningThreads=1 poolSize=6 queuedTasks=0 queuedSubmissions=0 parallelism=6 stealCount=3

EDIT: Used Fixed sized ThreadPoolExecutor as taskSupport, same issue again, only 2 threads out of 6 were executing last several tasks.

Explicitly submitting tasks to ThreadPoolExecutor yielded correct result. All threads were active till all tasks were finished. Yet to figure out why setting in taskSupport does not work. Explicitly submitting tasks to Fork join pool or Thread Pool Executor finishes in 11 secs, while tasks.par with either one takes 17 secs. Maybe due shouldSplitFurther defined in IterableSplitter ?

来源:https://stackoverflow.com/questions/55703332/using-threadpooltasksupport-as-tasksupport-for-parallel-collections-in-scala

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!