Hadoop partitioner

后端 未结 2 1884
野性不改
野性不改 2021-02-09 14:27

I want to ask about Hadoop partitioner ,is it implemented within Mappers?. How to measure the performance of using the default hash partitioner - Is there better partitioner to

2条回答
  •  挽巷
    挽巷 (楼主)
    2021-02-09 14:47

    Partitioner is a key component in between Mappers and Reducers. It distributes the maps emitted data among the Reducers.

    Partitioner runs within every Map Task JVM (java process).

    The default partitioner HashPartitioner works based on Hash function and it is very faster compared other partitioner like TotalOrderPartitioner. It runs hash function on every map output key i.e.:

    Reduce_Number = (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
    

    To check the performance of Hash Partitioner, use Reduce task counters and see how the distribution happened among the reducers.

    Hash Partitioner is basic partitioner and it doesn't suit for processing data with high skewness.

    To address the data skew problems, we need to write the custom partitioner class extending Partitioner.java class from MapReduce API.

    The example for custom partitioner is like RandomPartitioner. It is one of the best ways to distribute the skewed data among the reducers evenly.

提交回复
热议问题