how to limit the number of mappers

前端未结

关注

 4  787

半阙折子戏 2021-01-21 09:41

I explicitly specify the number of mappers within my java program using conf.setNumMapTasks(), but when the job ends, the counter shows that the number of launched

4条回答

一生所求 (楼主)

2021-01-21 09:48

Quoting the javadoc of JobConf#setNumMapTasks():

Note: This is only a hint to the framework. The actual number of spawned map tasks depends on the number of InputSplits generated by the job's InputFormat.getSplits(JobConf, int). A custom InputFormat is typically used to accurately control the number of map tasks for the job.

Hadoop also relaunches failed or long running map tasks in order to provide high availability.

You can limit the number of map tasks concurrently running on a single node. And you could limit the number of launched tasks provided that you have big input files. You would have to write an own InputFormat class, which is not splitable. Then Hadoop will run a map task for every input file, that you have.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...