Sorted word count using Hadoop MapReduce

后端 未结 4 963
鱼传尺愫
鱼传尺愫 2020-12-16 02:37

I\'m very much new to MapReduce and I completed a Hadoop word-count example.

In that example it produces unsorted file (with key-value pairs) of word counts. So is i

4条回答
  •  时光说笑
    2020-12-16 02:48

    As you have said, one possibility is to write two jobs to do this. First job: Simple wordcount example

    Second job: Does the sorting part.

    The pseudo code could be:

    Note : The output file generated by the first job will be the input for the second job

        Mapper2(String _key, Intwritable _value){
        //just reverse the position of _value and _key. This is useful because reducer will get the output in the sorted and shuffled manner.
        emit(_value,_key);
        }
    
        Reduce2(IntWritable valueofMapper2,Iterable keysofMapper2){
    //At the reducer side, all the keys that have the same count are merged together.
            for each K in keysofMapper2{
            emit(K,valueofMapper2); //This will sort in ascending order.
            }
    
        }
    

    You can also sort in descending order for which it is feasible to write a separate comparator class which will do the trick. Include comparator inside the job as:

    Job.setComparatorclass(Comparator.class);
    

    This comparator will sort the values in descending order before sending to the reducer side. So on the reducer, you just emit the values.

提交回复
热议问题