Partitioning! how does hadoop make it? Use a hash function? what is the default function?

后端未结

关注

 1  418

Partitioning is the process of determining which reducer instance will receive which intermediate keys and values. Each mapper must determine for all of its output (key, val

相关标签:

1条回答

梦谈多话

2021-01-01 01:27
The default partitioner in Hadoop is the HashPartitioner which has a method called getPartition. It takes key.hashCode() & Integer.MAX_VALUE and finds the modulus using the number of reduce tasks.

For example, if there are 10 reduce tasks, getPartition will return values 0 through 9 for all keys.

Here is the code:
```
public class HashPartitioner<K, V> extends Partitioner<K, V> {
    public int getPartition(K key, V value, int numReduceTasks) {
        return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
    }
}
```
To create a custom partitioner, you would extend Partitioner, create a method getPartition, then set your partitioner in the driver code (job.setPartitionerClass(CustomPartitioner.class);). This is particularly helpful if doing secondary sort operations, for example.
0 讨论(0)
发布评论:

提交评论
- 加载中...