Hadoop partitioner

后端未结

关注

 2  1884

野性不改 2021-02-09 14:27

I want to ask about Hadoop partitioner ,is it implemented within Mappers?. How to measure the performance of using the default hash partitioner - Is there better partitioner to

2条回答

挽巷 (楼主)

2021-02-09 14:47
Partitioner is a key component in between Mappers and Reducers. It distributes the maps emitted data among the Reducers.

Partitioner runs within every Map Task JVM (java process).

The default partitioner HashPartitioner works based on Hash function and it is very faster compared other partitioner like TotalOrderPartitioner. It runs hash function on every map output key i.e.:
```
Reduce_Number = (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
```
To check the performance of Hash Partitioner, use Reduce task counters and see how the distribution happened among the reducers.

Hash Partitioner is basic partitioner and it doesn't suit for processing data with high skewness.

To address the data skew problems, we need to write the custom partitioner class extending Partitioner.java class from MapReduce API.

The example for custom partitioner is like RandomPartitioner. It is one of the best ways to distribute the skewed data among the reducers evenly.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...