I\'m a novice on hadoop, I\'m getting familiar to the style of map-reduce programing but now I faced a problem : Sometimes I need only map for a job and I only need the map
You can also use the IdentityReducer:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/IdentityReducer.html
If you are using oozie as a scheduler to manager your hadoop jobs, then you can just set the property mapred.reduce.tasks(which is the default number of reduce tasks per job) to 0. You can add your mapper in the property mapreduce.map.class, and also there will be no need to add the property mapreduce.reduce.class since reducers are not required.
<configuration>
<property>
<name>mapreduce.map.class</name>
<value>my.com.package.AbcMapper</value>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>0</value>
</property>
.
.
.
<configuration>
This turns off the reducer.
job.setNumReduceTasks(0);
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#setNumReduceTasks(int)
Can be quite helpful when you need to launch job with mappers only from terminal. You can turn off reducers by specifing 0 reducers in hadoop jar command implicitly:
-D mapred.reduce.tasks=0
So the result command will be following:
hadoop jar myJob.jar -D mapred.reduce.tasks=0 -input myInputDirs -output myOutputDir
To be backward compatible, Hadoop also supports the "-reduce NONE" option, which is equivalent to "-D mapred.reduce.tasks=0".