I explicitly specify the number of mappers within my java program using conf.setNumMapTasks()
, but when the job ends, the counter shows that the number of launched
Quoting the javadoc of JobConf#setNumMapTasks():
Note: This is only a hint to the framework. The actual number of spawned map tasks depends on the number of
InputSplit
s generated by the job'sInputFormat.getSplits(JobConf, int)
. A customInputFormat
is typically used to accurately control the number of map tasks for the job.
Hadoop also relaunches failed or long running map tasks in order to provide high availability.
You can limit the number of map tasks concurrently running on a single node. And you could limit the number of launched tasks provided that you have big input files. You would have to write an own InputFormat
class, which is not splitable. Then Hadoop will run a map task for every input file, that you have.