发表新帖

发表新帖

Sorted word count using Hadoop MapReduce

后端未结

关注

 4  963

鱼传尺愫 2020-12-16 02:37

I\'m very much new to MapReduce and I completed a Hadoop word-count example.

In that example it produces unsorted file (with key-value pairs) of word counts. So is i

4条回答

时光说笑 (楼主)

2020-12-16 02:48
As you have said, one possibility is to write two jobs to do this. First job: Simple wordcount example

Second job: Does the sorting part.

The pseudo code could be:

Note : The output file generated by the first job will be the input for the second job
```
    Mapper2(String _key, Intwritable _value){
    //just reverse the position of _value and _key. This is useful because reducer will get the output in the sorted and shuffled manner.
    emit(_value,_key);
    }

    Reduce2(IntWritable valueofMapper2,Iterable keysofMapper2){
//At the reducer side, all the keys that have the same count are merged together.
        for each K in keysofMapper2{
        emit(K,valueofMapper2); //This will sort in ascending order.
        }

    }
```
You can also sort in descending order for which it is feasible to write a separate comparator class which will do the trick. Include comparator inside the job as:
```
Job.setComparatorclass(Comparator.class);
```
This comparator will sort the values in descending order before sending to the reducer side. So on the reducer, you just emit the values.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题