Using spark-submit, what is the behavior of the --total-executor-cores option?

前端 未结 2 1894
孤城傲影
孤城傲影 2021-02-07 23:29

I am running a spark cluster over C++ code wrapped in python. I am currently testing different configurations of multi-threading options (at Python level or Spark level).

相关标签:
2条回答
  • 2021-02-07 23:36

    The documentation does not seem clear.

    From my experience, the most common practice to allocate resources is by indicating the number of executors and the number of cores per executor, for example (taken from here):

    $ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn-cluster \
    --num-executors 10 \
    --driver-memory 4g \
    --executor-memory 2g \
    --executor-cores 4 \
    --queue thequeue \
    lib/spark-examples*.jar \
    10
    

    However, this approach is limited to YARN, and is not applicable to standalone and mesos based Spark, according to this.

    Instead, the parameter --total-executor-cores can be used, which represents the total amount of cores - of all executors - assigned to the Spark job. In your case, having a total of 40 cores, setting the attribute --total-executor-cores 40 would make use of all the available resources.

    Unfortunately, I am not aware of how Spark distributes the workload when less resources than the total available are provided. If working with two or more simultaneous jobs, however, it should be transparent to the user, in that Spark (or whatever resource manager) would manage how the resources are managed depending on the user settings.

    0 讨论(0)
  • 2021-02-07 23:37

    To make sure how many workers started on each slave, open web browser, type http://master-ip:8080, and see the workers section about how many workers has been started exactly, and also which worker on which slave. (I mention these above because I am not sure what do you mean by saying '4 slaves per node')

    By default, spark would start exact 1 worker on each slave unless you specify SPARK_WORKER_INSTANCES=n in conf/spark-env.sh, where n is the number of worker instance you would like to start on each slave.

    When you submit a spark job through spark-submit, spark would start an application driver and several executors for your job.

    • If not specified clearly, spark would start one executor for each worker, i.e. the total executor num equal to the total worker num, and all cores would be available to this job.
    • --total-executor-cores you specified would limit the total cores that is available to this application.
    0 讨论(0)
提交回复
热议问题