Alternative to the default hashpartioner provided with hadoop

瘦欲@ 提交于 2019-12-11 07:03:49

问题


I have a hadoop MapReduce program that distributes keys unevenly. Some reducers end up with two keys, some with one key, and some with none. how do I force hadoop to distribute each partition with a certain key to a separate reducer. I have nine unique keys of the form:

0,0
0,1
0,2
1,0
1,1
1,2
2,0
2,1
2,2

and I set the job.setNumReduceTasks(9); but the hashpartitioner seems to hash two keys to the same hashcode causing overlapped keys being sent to the same reducer and leaving some reducers idle.

Does a random partitioner resolve this? will it send each unique key to a random reducer guaranteeing each reducer receives a single key. How do I enable it and replace the default?

EDIT:

can someone please explain why my output looks like

-rw-r--r--   1 user supergroup          0 2018-04-19 18:58 outbin9/_SUCCESS
drwxr-xr-x   - user supergroup          0 2018-04-19 18:57 outbin9/_logs
-rw-r--r--   1 user supergroup        869 2018-04-19 18:57 outbin9/part-r-00000
-rw-r--r--   1 user supergroup       1562 2018-04-19 18:57 outbin9/part-r-00001
-rw-r--r--   1 user supergroup        913 2018-04-19 18:58 outbin9/part-r-00002
-rw-r--r--   1 user supergroup       1771 2018-04-19 18:58 outbin9/part-r-00003
-rw-r--r--   1 user supergroup        979 2018-04-19 18:58 outbin9/part-r-00004
-rw-r--r--   1 user supergroup        880 2018-04-19 18:58 outbin9/part-r-00005
-rw-r--r--   1 user supergroup          0 2018-04-19 18:58 outbin9/part-r-00006
-rw-r--r--   1 user supergroup          0 2018-04-19 18:58 outbin9/part-r-00007
-rw-r--r--   1 user supergroup        726 2018-04-19 18:58 outbin9/part-r-00008

The larger groups part-r-00001 and part-r-00003 have received keys 1,0 and 2,2 / 0,0 and 1,2 respectively. And notice that part-r-00006 and part-r-00007 are empty.


回答1:


HashPartitioner is the default partitioner in Hadoop, which creates one Reduce task for each unique “key”. All the values with the same key goes to the same instance of your reducer, in a single call to the reduce function.

If user is interested to store a particular group of results in different reducers, then the user can write his own partitioner implementation. It can be general purpose or custom made to the specific data types or values that you expect to use in user application.

Custom Partitioner is a process that allows you to store the results in different reducers, based on the user condition. By setting a partitioner to partition by the key, we can guarantee that, records for the same key will go to the same reducer. A partitioner ensures that only one reducer receives all the records for that particular key.

sample example link



来源:https://stackoverflow.com/questions/49933268/alternative-to-the-default-hashpartioner-provided-with-hadoop

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!