问题
My use-case:
- We have a long running Spark job. Here after called, LRJ. This job runs once in a week.
- We have multiple small running jobs that can come at any time. These jobs has high priority than the long running job.
To address this, we created YARN queues as below:
Created YARN Queues for resource management. Configured Q1 queue for long running job and Q2 queue for small running jobs.
Config:
Q1 : capacity = 50% and it can go upto 100%
capacity on CORE nodes = 50% and maximum 100%
Q2 : capacity = 50% and it can go upto 100%
capacity on CORE nodes = 50% and maximum 100%
Issue we are facing:
When LRJ is in progress, it acquires all the resources. Multiple small running jobs waits as LRJ has acquired all the resources. Once the cluster scales up and new resources are available small running jobs get resources. However, because cluster takes time for scaling-up activity, this creates a significant delay in allocating resources to these jobs.
Update 1:
We have tried using maximum-capacity
config as per YARN docs but its not working as I posted in my other question here
回答1:
With more analysis, which involves discussion with some unsung heroes, we decided to apply preemption on YARN queues as per our use-case.
Jobs on Q1 queue will be preempted when following sequence of events occur:
- Q1 queue is using more than the specified capacity (Example: LRJ job is using more resources than the specified on queue).
- Suddenly jobs on Q2 queue gets scheduled (Example: Suddenly multiple small running jobs get triggered).
To understand preemption, read this and this
Following is the sample configuration, that we are using in our AWS CloudFormation script to launch an EMR cluster:
Capacity-Scheduler configuration:
yarn.scheduler.capacity.resource-calculator: org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
yarn.scheduler.capacity.root.queues: Q1,Q2
yarn.scheduler.capacity.root.Q2.capacity: 60
yarn.scheduler.capacity.root.Q1.capacity: 40
yarn.scheduler.capacity.root.Q2.accessible-node-labels: "*"
yarn.scheduler.capacity.root.Q1.accessible-node-labels: "*"
yarn.scheduler.capacity.root.accessible-node-labels.CORE.capacity: 100
yarn.scheduler.capacity.root.Q2.accessible-node-labels.CORE.capacity: 60
yarn.scheduler.capacity.root.Q1.accessible-node-labels.CORE.capacity: 40
yarn.scheduler.capacity.root.Q1.accessible-node-labels.CORE.maximum-capacity: 60
yarn.scheduler.capacity.root.Q2.disable_preemption: true
yarn.scheduler.capacity.root.Q1.disable_preemption: false
yarn-site configuration:
yarn.resourcemanager.scheduler.class: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
yarn.resourcemanager.scheduler.monitor.enable: true
yarn.resourcemanager.scheduler.monitor.policies: org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy
yarn.resourcemanager.monitor.capacity.preemption.monitoring_interval: 2000
yarn.resourcemanager.monitor.capacity.preemption.max_wait_before_kill: 3000
yarn.resourcemanager.monitor.capacity.preemption.total_preemption_per_round: 0.5
yarn.resourcemanager.monitor.capacity.preemption.max_ignored_over_capacity: 0.1
yarn.resourcemanager.monitor.capacity.preemption.natural_termination_factor: 1
With the above, you have to specify your jobs on the particular queue based on your use-case.
来源:https://stackoverflow.com/questions/60739639/resource-optimization-utilization-in-emr-for-long-running-job-and-multiple-small