问题
In Hive I ofter do queries like:
select columnA, sum(columnB) from ... group by ...
I read some mapreduce example and one reducer can only produce one key. It seems the number of reducers completely depends on number of keys in columnA.
Therefore, why could hive set number of reducers manully?
If there are 10 different values in columnA and I set number of reducers to 2, what will happen? Each reducers will be reused 5 times?
If there are 10 different values in columnA and I set number of reducers to 20, what will happen? hive will only generate 10 reducers?
回答1:
Normally you should not set the exact number of reducers manually. Use bytes.per.reducer
instead:
--The number of reduce tasks determined at compile time
--Default size is 1G, so if the input size estimated is 10G then 10 reducers will be used
set hive.exec.reducers.bytes.per.reducer=67108864;
If you want to limit cluster usage by job reducers, you can set this property: hive.exec.reducers.max
If you are running on Tez, at execution time Hive can dynamically set the number of reducers if this property is set:
set hive.tez.auto.reducer.parallelism = true;
In this case the number of reducers initially started may be bigger because it was estimated based on size, at runtime extra reducers can be removed.
One reducer can process many keys, it depends on data size and bytes.per.reducer and reducer limit configuration settings. The same keys will pass to the same reducer in case of query like in your example because each reducer container is running isolated and all rows having particular key need to be passed to single reducer to be able calculate count for this key.
Extra reducers can be forced (mapreduce.job.reducers=N
) or started automatically based on wrong estimation(because of stale stats) and if not removed at run-time, they will do nothing and finish quickly because there is nothing to process. But such reducers anyway will be scheduled and containers allocated, so better do not force extra reducers and keep stats fresh for better estimation.
来源:https://stackoverflow.com/questions/62369975/what-will-happen-if-hive-number-of-reducers-is-different-to-number-of-keys