I\'m very much new to MapReduce and I completed a Hadoop word-count example.
In that example it produces unsorted file (with key-value pairs) of word counts. So is i
As you have said, one possibility is to write two jobs to do this. First job: Simple wordcount example
Second job: Does the sorting part.
The pseudo code could be:
Note : The output file generated by the first job will be the input for the second job
Mapper2(String _key, Intwritable _value){
//just reverse the position of _value and _key. This is useful because reducer will get the output in the sorted and shuffled manner.
emit(_value,_key);
}
Reduce2(IntWritable valueofMapper2,Iterable keysofMapper2){
//At the reducer side, all the keys that have the same count are merged together.
for each K in keysofMapper2{
emit(K,valueofMapper2); //This will sort in ascending order.
}
}
You can also sort in descending order for which it is feasible to write a separate comparator class which will do the trick. Include comparator inside the job as:
Job.setComparatorclass(Comparator.class);
This comparator will sort the values in descending order before sending to the reducer side. So on the reducer, you just emit the values.