I explicitly specify the number of mappers within my java program using conf.setNumMapTasks()
, but when the job ends, the counter shows that the number of launched
According to the Hadoop API Jonf.setNumMapTasks is just a hint to the Hadoop runtime. The total number of map tasks equals to the number of blocks in the input data to be processed.
Although, it should be possible to configure the number of map/reduce slots per node by using the mapred.tasktracker.map.tasks.maximum
and the mapred.tasktracker.reduce.tasks.maximum
in mapred-site.xml. This way it's possible to configure the total number of mappers/reducers executing in parallel across the entire cluster.