Apache Hadoop Yarn - Underutilization of cores

喜你入骨 提交于 2019-11-27 11:22:44

The problem lies not with yarn-site.xml or spark-defaults.conf but actually with the resource calculator that assigns the cores to the executors or in the case of MapReduce jobs, to the Mappers/Reducers.

The default resource calculator i.e org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator uses only memory information for allocating containers and CPU scheduling is not enabled by default. To use both memory as well as the CPU, the resource calculator needs to be changed to org.apache.hadoop.yarn.util.resource.DominantResourceCalculator in the capacity-scheduler.xml file.

Here's what needs to change.

<property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
</property>

I had the similar kind of issue and from my code i was setting up the spark.executor.cores as 5. Even though it was just taking 1 which is the default core. In the spark UI and in environment tab i was seeing 5 cores. But while checking the executors tabs i was just able to see 1 process is in RUNNING state against an executor. I was using the spark version 1.6.3.

So then i have tried to hit the spark-submit command as --conf spark.executor.cores=5 which is working fine as using 5 cores

or just

--executor-cores 5 which also works.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!