Hadoop MapReduce: Clarification on number of reducers

后端 未结 2 1704
花落未央
花落未央 2021-02-03 10:43

In the MapReduce framework, one reducer is used for each key generated by the mapper.

So you would think that specifying the number of Reducers in Hadoop MapReduce would

2条回答
  •  栀梦
    栀梦 (楼主)
    2021-02-03 11:10

    To simplify @Judge Mental's (very accurate) answer a little bit: A reducer task can work on many keys at a time, but the mapred.reduce.tasks=# parameter declares how many simultaneous reducer tasks will run for a specific job.

    An example if your mapred.reduce.tasks=10:
    You have 2,000 keys, each key with 50 values (for an evenly distributed 10,000 k:v pairs). Each reducer should be roughly handling 200 keys (1,000 k:v pairs).

    An example if your mapred.reduce.tasks=20:
    You have 2,000 keys, each key with 50 values (for an evenly distributed 10,000 k:v pairs). Each reducer should be roughly handling 100 keys (500 k:v pairs).

    In the example above, the fewer keys each reducer has to work with, the faster the overall job will be ... so long as you have the available reducer resources in the cluster, of course.

提交回复
热议问题