Why does vcore always equal the number of nodes in Spark on YARN?

前端 未结 3 1076
醉话见心
醉话见心 2020-12-28 15:58

I have a Hadoop cluster with 5 nodes, each of which has 12 cores with 32GB memory. I use YARN as MapReduce framework, so I have the following settings with YARN:

    <
相关标签:
3条回答
  • 2020-12-28 16:43

    Executors take 10 cores each, 2 cores for Application Master = 42 Cores requested when you have 40 vCores total.

    Reduce executor cores to 8 and make sure to restart each NodeManager

    Also modify yarn-site.xml and set these properties:

    yarn.scheduler.minimum-allocation-mb
    yarn.scheduler.maximum-allocation-mb
    yarn.scheduler.minimum-allocation-vcores
    yarn.scheduler.maximum-allocation-vcores
    
    0 讨论(0)
  • 2020-12-28 16:46

    I was wondering the same but changing the resource-calculator worked for me.
    This is how I set the property:

        <property>
            <name>yarn.scheduler.capacity.resource-calculator</name>      
            <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>       
        </property>
    

    Check in the YARN UI in the application how many containers and vcores are assigned, with the change the number of containers should be executors+1 and the vcores should be: (executor-cores*num-executors) +1.

    0 讨论(0)
  • 2020-12-28 16:49

    Without setting the YARN scheduler to FairScheduler, I saw the same thing. The Spark UI showed the right number of tasks, though, suggesting nothing was wrong. My cluster showed close to 100% CPU usage, which confirmed this.

    After setting FairScheduler, the YARN Resources looked correct.

    0 讨论(0)
提交回复
热议问题