Using spark-submit, what is the behavior of the --total-executor-cores option?

前端 未结 2 1893
孤城傲影
孤城傲影 2021-02-07 23:29

I am running a spark cluster over C++ code wrapped in python. I am currently testing different configurations of multi-threading options (at Python level or Spark level).

2条回答
  •  栀梦
    栀梦 (楼主)
    2021-02-07 23:36

    The documentation does not seem clear.

    From my experience, the most common practice to allocate resources is by indicating the number of executors and the number of cores per executor, for example (taken from here):

    $ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \
    --master yarn-cluster \
    --num-executors 10 \
    --driver-memory 4g \
    --executor-memory 2g \
    --executor-cores 4 \
    --queue thequeue \
    lib/spark-examples*.jar \
    10
    

    However, this approach is limited to YARN, and is not applicable to standalone and mesos based Spark, according to this.

    Instead, the parameter --total-executor-cores can be used, which represents the total amount of cores - of all executors - assigned to the Spark job. In your case, having a total of 40 cores, setting the attribute --total-executor-cores 40 would make use of all the available resources.

    Unfortunately, I am not aware of how Spark distributes the workload when less resources than the total available are provided. If working with two or more simultaneous jobs, however, it should be transparent to the user, in that Spark (or whatever resource manager) would manage how the resources are managed depending on the user settings.

提交回复
热议问题