List of spark-submit options

时间秒杀一切 提交于 2019-12-23 01:43:33

问题


There are a ton of tunable settings mentioned on Spark configurations page. However as told here, the SparkSubmitOptionParser attribute-name for a Spark property can be different from that property's-name.

For instance, spark.executor.cores is passed as --executor-cores in spark-submit.


Where can I find an exhaustive list of all tuning parameters of Spark (along-with their SparkSubmitOptionParser property name) that can be passed with spark-submit command?


回答1:


While @suj1th's valuable inputs did solve my problem, I'm answering my own question to directly address my query.


  • You need not look up for SparkSubmitOptionParser's attribute-name for a given Spark property (configuration setting). Both will do just fine. However, do note that there's a subtle difference between there usage as shown below:

    spark-submit --executor-cores 2

    spark-submit --conf spark.executor.cores=2

    Both commands shown above will have same effect. The second method takes configurations in the format --conf <key>=<value>.

  • Enclosing values in quotes (correct me if this is incorrect / incomplete)

    (i) Values need not be enclosed in quotes (single '' or double "") of any kind (you still can if you want).

    (ii) If the value has a space character, enclose the entire thing in double quotes "" like "<key>=<value>" as shown here.

  • For a comprehensive list of all configurations that can be passed with spark-submit, just run spark-submit --help

  • In this link provided by @suj1th, they say that:

    configuration values explicitly set on a SparkConf take the highest precedence, then flags passed to spark-submit, then values in the defaults file.

    If you are ever unclear where configuration options are coming from, you can print out fine-grained debugging information by running spark-submit with the --verbose option.


Following two links from Spark docs list a lot of configurations:

  • Spark Configuration
  • Running Spark on YARN



回答2:


In your case, you should actually load your configurations from a file, as mentioned in this document, instead of passing them as flags to spark-submit. This relieves the overhead of mapping SparkSubmitArguments to Spark configuration parameters. To quote from the above document:

Loading default Spark configurations this way can obviate the need for certain flags to spark-submit. For instance, if the spark.master property is set, you can safely omit the --master flag from spark-submit. In general, configuration values explicitly set on a SparkConf take the highest precedence, then flags passed to spark-submit, then values in the defaults file.



来源:https://stackoverflow.com/questions/49381168/list-of-spark-submit-options

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!