Hadoop Capacity Scheduler and Spark

问题

If I define CapacityScheduler Queues in yarn as explained here

http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html

how do I make spark use this?

I want to run spark jobs... but they should not take up all the cluster but instead execute on a CapacityScheduler which has a fixed set of resources allocated to it.

Is that possible ... specifically on the cloudera platform (given that spark on cloudera runs on yarn?).

回答1:

You should configure the CapacityScheduler as your need by editing capacity-scheduler.xml. You also need to specify yarn.resourcemanager.scheduler.class in yarn-site.xml to be org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler which is also a default option for current hadoop version
submit spark job to a designed queue.

eg:

$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn \
    --deploy-mode cluster \
    --driver-memory 4g \
    --executor-memory 2g \
    --executor-cores 1 \
    --queue thequeue \
    lib/spark-examples*.jar \
    10

The --queue indicates the queue you will submit which should be conformed with your CapacityScheduler configuration

来源：https://stackoverflow.com/questions/36167378/hadoop-capacity-scheduler-and-spark

标签

Hadoop

apache-spark

Cloudera

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!