how to reduce the number of containers in the query

混江龙づ霸主 提交于 2019-12-01 10:05:39

问题


I have a query using to much containers and to much memory. (97% of the memory used). Is there a way to set the number of containers used in the query and limit the max memory? The query is running on Tez.

Thanks in advance


回答1:


Controlling the number of Mappers:

The number of mappers depends on various factors such as how the data is distributed among nodes, input format, execution engine and configuration params. See also How initial task parallelism works

MR uses CombineInputFormat, while Tez uses grouped splits.

Tez:

set tez.grouping.min-size=16777216; -- 16 MB min split
set tez.grouping.max-size=1073741824; -- 1 GB max split

Increase these figures to reduce the number of mappers running.

Also Mappers are running on data nodes where the data is located, that is why manually controlling the number of mappers is not an easy task, not always possible to combine input.

Controlling the number of Reducers:

The number of reducers determined according to

mapreduce.job.reduces
  • The default number of reduce tasks per job. Typically set to a prime close to the number of available hosts. Ignored when mapred.job.tracker is "local". Hadoop set this to 1 by default, whereas Hive uses -1 as its default value. By setting this property to -1, Hive will automatically figure out what should be the number of reducers.

hive.exec.reducers.bytes.per.reducer - The default in Hive 0.14.0 and earlier is 1 GB.

Also hive.exec.reducers.max - Maximum number of reducers that will be used. If mapreduce.job.reduces is negative, Hive will use this as the maximum number of reducers when automatically determining the number of reducers.

Simply set hive.exec.reducers.max=<number> to limit the number of reducers running.

If you want to increase reducers parallelism, increase hive.exec.reducers.max and decrease hive.exec.reducers.bytes.per.reducer.

Memory settings

set tez.am.resource.memory.mb=8192;
set tez.am.java.opts=-Xmx6144m;
set tez.reduce.memory.mb=6144;
set hive.tez.container.size=9216;
set hive.tez.java.opts=-Xmx6144m;

The default settings mean that the actual Tez task will use the mapper's memory setting:

hive.tez.container.size = mapreduce.map.memory.mb
hive.tez.java.opts = mapreduce.map.java.opts

Read this for more details: Demystify Apache Tez Memory Tuning - Step by Step

I would suggest to optimize query first. Use map-joins if possible, use vectorising execution, add distribute by partitin key if you are writing partitioned table to reduce memory consumption on reducers and write good sql of course.



来源:https://stackoverflow.com/questions/55447168/how-to-reduce-the-number-of-containers-in-the-query

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!