MapReduce (secondary) sorting / filtering - how?

后端 未结 3 1891
忘了有多久
忘了有多久 2021-02-06 19:43

I have a logfile of timestamped values (concurrent users) of different \"zones\" of a chatroom webapp in the format \"Timestamp; Zone; Value\". For each zone exists one value pe

3条回答
  •  难免孤独
    2021-02-06 20:03

    You can do this with just one MR using secondary sorting. Here are the steps

    1. Define key as concatenation of zone, yyyy-mm-dd and the value as zone:yyyy-mm-dd:value As I will explain, you don't even need to emit any value from the mapper. NullWritable is good enough for the value

    2. Implement key comparator such that zone:yyyy-mm-dd part of the key is ordered ascending and the values part is ordered descending. This will ensure that for all keys for given zone:yyyy-mm-dd, the first key in the group will have the highest value

    3. Define partitioner and grouping comparator of the composite key based on the zone and day part of the key only i.e. zone:yyyy-mm-dd.

    4. In your reducer input, you will get the first key for a key group, which will contain zone, day and the max value for that zone, day combination. The value part of the reducer input will be a list of NullWritable, which can be ignored.

提交回复
热议问题