问题
I have following record set to process like
1000, 1001, 1002 to 1999,
2000, 2001, 2002 to 2999,
3000, 3001, 3002 to 3999
And I want to process the following record set using HIVE in such a way so that reducer-1 will process data 1000 to 1999 and reducer-2 will process data 2000 to 2999 and reducer-3 will process data 3000 to 3999.Please help me to solve above problem.
回答1:
Use DISTRIBUTE BY
, mappers output is being grouped according to the distribute by clause to be transferred to reducers for processing:
select ...
from ...
distribute by case when col between 1000 and 1999 then 1
when col between 2000 and 2999 then 2
when col between 3000 and 3999 then 3
end
Or simply
distribute by floor(col/1000)
来源:https://stackoverflow.com/questions/59882425/reducer-selection-in-hive