how to limit the number of mappers

前端 未结 4 778
半阙折子戏
半阙折子戏 2021-01-21 09:41

I explicitly specify the number of mappers within my java program using conf.setNumMapTasks(), but when the job ends, the counter shows that the number of launched

4条回答
  •  一生所求
    2021-01-21 09:48

    Quoting the javadoc of JobConf#setNumMapTasks():

    Note: This is only a hint to the framework. The actual number of spawned map tasks depends on the number of InputSplits generated by the job's InputFormat.getSplits(JobConf, int). A custom InputFormat is typically used to accurately control the number of map tasks for the job.

    Hadoop also relaunches failed or long running map tasks in order to provide high availability.

    You can limit the number of map tasks concurrently running on a single node. And you could limit the number of launched tasks provided that you have big input files. You would have to write an own InputFormat class, which is not splitable. Then Hadoop will run a map task for every input file, that you have.

提交回复
热议问题