When do reduce tasks start in Hadoop?

前端未结

关注

 8  817

In Hadoop when do reduce tasks start? Do they start after a certain percentage (threshold) of mappers complete? If so, is this threshold fixed? What kind of threshold is typ

相关标签:

8条回答

臣服心动

2020-11-27 10:24

Consider a WordCount example in order to understand better how the map reduce task works.Suppose we have a large file, say a novel and our task is to find the number of times each word occurs in the file. Since the file is large, it might be divided into different blocks and replicated in different worker nodes. The word count job is composed of map and reduce tasks. The map task takes as input each block and produces an intermediate key-value pair. In this example, since we are counting the number of occurences of words, the mapper while processing a block would result in intermediate results of the form (word1,count1), (word2,count2) etc. The intermediate results of all the mappers is passed through a shuffle phase which will reorder the intermediate result.

Assume that our map output from different mappers is of the following form:

Map 1:- (is,24) (was,32) (and,12)

Map2 :- (my,12) (is,23) (was,30)

The map outputs are sorted in such a manner that the same key values are given to the same reducer. Here it would mean that the keys corresponding to is,was etc go the same reducer.It is the reducer which produces the final output,which in this case would be:- (and,12)(is,47)(my,12)(was,62)

0 讨论(0)
发布评论:

提交评论
- 加载中...
春和景丽

2020-11-27 10:34

Reduce starts only after all the mapper have fished there task, Reducer have to communicate with all the mappers so it has to wait till the last mapper finished its task.however mapper starts transferring data to the moment it has completed its task.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2