How to pass environment variables to spark driver in cluster mode with spark-submit

后端 未结 5 1341
有刺的猬
有刺的猬 2021-01-01 18:33

spark-submit allows to configure the executor environment variables with --conf spark.executorEnv.FOO=bar, and the Spark REST API allows to pass so

相关标签:
5条回答
  • 2021-01-01 18:45

    On YARN at least, this works:

    spark-submit --deploy-mode cluster --conf spark.yarn.appMasterEnv.FOO=bar myapp.jar


    It's mentioned in http://spark.apache.org/docs/latest/configuration.html#environment-variables that:

    Note: When running Spark on YARN in cluster mode, environment variables need to be set using the spark.yarn.appMasterEnv.[EnvironmentVariableName] property in your conf/spark-defaults.conf file.

    I have tested that it can be passed with --conf flag for spark-submit, so that you don't have to edit global conf files.

    0 讨论(0)
  • 2021-01-01 18:51

    Did you test with

    --conf spark.driver.FOO="bar"
    

    and then getting value with

    spark.conf.get("spark.driver.FOO")
    
    0 讨论(0)
  • 2021-01-01 18:56

    On Yarn in cluster mode, it worked by adding the environment variables in the spark-submit command using --conf as below-

    spark-submit --master yarn-cluster --num-executors 15 --executor-memory 52g --executor-cores 7 --driver-memory 52g --conf "spark.yarn.appMasterEnv.FOO=/Path/foo" --conf "spark.executorEnv.FOO2=/path/foo2" app.jar

    Also, you can do it by adding them in conf/spark-defaults.conf file.

    0 讨论(0)
  • 2021-01-01 18:57

    You can use the below Classification to set-up environment variables on executor and master node:

    [   
      {
       "Classification": "yarn-env",
       "Properties": {},
       "Configurations": [
           {
             "Classification": "export",
             "Properties": {
                 "VARIABLE_NAME": VARIABLE_VALUE,
             }
           }
       ]
     }
    ]
    

    If you just set spark.yarn.appMasterEnv.FOO = "foo", then the env variable won't be present on executor instances.

    0 讨论(0)
  • 2021-01-01 18:58

    Yes, That is possible. What are the variables you need you could post that in spark-submit like you're doing?

    spark-submit --deploy-mode cluster myapp.jar
    

    Take variables from http://spark.apache.org/docs/latest/configuration.html and depends on your optimization use these. This link could also be helpful.

    I used to use in cluster mode but now I'm using in YARN so my variables are as follows: (Hopefully helpful)

    hastimal@nm:/usr/local/spark$ ./bin/spark-submit --class  com.hastimal.Processing  --master yarn-cluster  --num-executors 15 --executor-memory 52g --executor-cores 7 --driver-memory 52g  --driver-cores 7 --conf spark.default.parallelism=105 --conf spark.driver.maxResultSize=4g --conf spark.network.timeout=300  --conf spark.yarn.executor.memoryOverhead=4608 --conf spark.yarn.driver.memoryOverhead=4608 --conf spark.akka.frameSize=1200  --conf spark.io.compression.codec=lz4 --conf spark.rdd.compress=true --conf spark.broadcast.compress=true --conf spark.shuffle.spill.compress=true --conf spark.shuffle.compress=true --conf spark.shuffle.manager=sort /users/hastimal/Processing.jar Main_Class /inputRDF/rdf_data_all.nt /output /users/hastimal/ /users/hastimal/query.txt index 2
    

    In this, my jar following are arguments of class.

    cc /inputData/data_all.txt /output /users/hastimal/ /users/hastimal/query.txt index 2

    0 讨论(0)
提交回复
热议问题