I have a Hadoop cluster with 5 nodes, each of which has 12 cores with 32GB memory. I use YARN as MapReduce framework, so I have the following settings with YARN:
Executors take 10 cores each, 2 cores for Application Master = 42 Cores requested when you have 40 vCores total.
Reduce executor cores to 8 and make sure to restart each NodeManager
Also modify yarn-site.xml and set these properties:
yarn.scheduler.minimum-allocation-mb
yarn.scheduler.maximum-allocation-mb
yarn.scheduler.minimum-allocation-vcores
yarn.scheduler.maximum-allocation-vcores
I was wondering the same but changing the resource-calculator worked for me.
This is how I set the property:
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
</property>
Check in the YARN UI in the application how many containers and vcores are assigned, with the change the number of containers should be executors+1 and the vcores should be: (executor-cores*num-executors) +1.
Without setting the YARN scheduler to FairScheduler, I saw the same thing. The Spark UI showed the right number of tasks, though, suggesting nothing was wrong. My cluster showed close to 100% CPU usage, which confirmed this.
After setting FairScheduler, the YARN Resources looked correct.