问题
If I define CapacityScheduler Queues in yarn as explained here
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
how do I make spark use this?
I want to run spark jobs... but they should not take up all the cluster but instead execute on a CapacityScheduler which has a fixed set of resources allocated to it.
Is that possible ... specifically on the cloudera platform (given that spark on cloudera runs on yarn?).
回答1:
- You should configure the CapacityScheduler as your need by editing capacity-scheduler.xml. You also need to specify yarn.resourcemanager.scheduler.class in yarn-site.xml to be org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler which is also a default option for current hadoop version
- submit spark job to a designed queue.
eg:
$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
--driver-memory 4g \
--executor-memory 2g \
--executor-cores 1 \
--queue thequeue \
lib/spark-examples*.jar \
10
The --queue
indicates the queue you will submit which should be conformed with your CapacityScheduler configuration
来源:https://stackoverflow.com/questions/36167378/hadoop-capacity-scheduler-and-spark