MapReduce (secondary) sorting / filtering - how?

后端 未结 3 1884
忘了有多久
忘了有多久 2021-02-06 19:43

I have a logfile of timestamped values (concurrent users) of different \"zones\" of a chatroom webapp in the format \"Timestamp; Zone; Value\". For each zone exists one value pe

3条回答
  •  滥情空心
    2021-02-06 20:13

    I don't know that you'd need two map/reduce steps - you could certainly do it with one, it's just that your results would be lists instead of single entries. Otherwise, yes, you'd split it up by zones, then split it by date.

    I'd probably split it up by zone, then have each zone return a list of the highest elements by day, since the reduction would be really easy at that point. To really get a benefit out of another map/reduction step you'd have to have a really large dataset and a lot of machines to split across - at which point I'd probably do a reduction on the entire key.

提交回复
热议问题