What does the shuffling phase actually do?
As shuffling is the process of bringing the mapper o/p to the reducer o/p, it just brings t
Mappers and Reducers are not separate machines but just separate code. Both, the mapping code as well as the reducing code runs on the same set machines present in the cluster.
So, after all machines in the cluster have run mapper, the results are:
Consider the step-2 a "global-grouping" because it is done in a manner that all values belonging to one key, go to their assigned unique node.
Now, the nodes run the Reducer code on the (key, value) pairs residing on their memory.