How to pass environment variables to spark driver in cluster mode with spark-submit

后端未结

关注

 5  1341

spark-submit allows to configure the executor environment variables with --conf spark.executorEnv.FOO=bar, and the Spark REST API allows to pass so

相关标签:

5条回答

别跟我提以往

2021-01-01 18:45

On YARN at least, this works:

spark-submit --deploy-mode cluster --conf spark.yarn.appMasterEnv.FOO=bar myapp.jar

It's mentioned in http://spark.apache.org/docs/latest/configuration.html#environment-variables that:

Note: When running Spark on YARN in cluster mode, environment variables need to be set using the spark.yarn.appMasterEnv.[EnvironmentVariableName] property in your conf/spark-defaults.conf file.

I have tested that it can be passed with --conf flag for spark-submit, so that you don't have to edit global conf files.

0 讨论(0)
发布评论:

提交评论
- 加载中...
半阙折子戏

2021-01-01 18:51
Did you test with
```
--conf spark.driver.FOO="bar"
```
and then getting value with
```
spark.conf.get("spark.driver.FOO")
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
天涯浪人

2021-01-01 18:56

On Yarn in cluster mode, it worked by adding the environment variables in the spark-submit command using --conf as below-

spark-submit --master yarn-cluster --num-executors 15 --executor-memory 52g --executor-cores 7 --driver-memory 52g --conf "spark.yarn.appMasterEnv.FOO=/Path/foo" --conf "spark.executorEnv.FOO2=/path/foo2" app.jar

Also, you can do it by adding them in conf/spark-defaults.conf file.

0 讨论(0)
发布评论:

提交评论
- 加载中...
面向向阳花

2021-01-01 18:57
You can use the below Classification to set-up environment variables on executor and master node:
```
[   
  {
   "Classification": "yarn-env",
   "Properties": {},
   "Configurations": [
       {
         "Classification": "export",
         "Properties": {
             "VARIABLE_NAME": VARIABLE_VALUE,
         }
       }
   ]
 }
]
```
If you just set spark.yarn.appMasterEnv.FOO = "foo", then the env variable won't be present on executor instances.
0 讨论(0)
发布评论:

提交评论
- 加载中...

醉酒成梦

2021-01-01 18:58

Yes, That is possible. What are the variables you need you could post that in spark-submit like you're doing?

spark-submit --deploy-mode cluster myapp.jar

Take variables from http://spark.apache.org/docs/latest/configuration.html and depends on your optimization use these. This link could also be helpful.

I used to use in cluster mode but now I'm using in YARN so my variables are as follows: (Hopefully helpful)

hastimal@nm:/usr/local/spark$ ./bin/spark-submit --class  com.hastimal.Processing  --master yarn-cluster  --num-executors 15 --executor-memory 52g --executor-cores 7 --driver-memory 52g  --driver-cores 7 --conf spark.default.parallelism=105 --conf spark.driver.maxResultSize=4g --conf spark.network.timeout=300  --conf spark.yarn.executor.memoryOverhead=4608 --conf spark.yarn.driver.memoryOverhead=4608 --conf spark.akka.frameSize=1200  --conf spark.io.compression.codec=lz4 --conf spark.rdd.compress=true --conf spark.broadcast.compress=true --conf spark.shuffle.spill.compress=true --conf spark.shuffle.compress=true --conf spark.shuffle.manager=sort /users/hastimal/Processing.jar Main_Class /inputRDF/rdf_data_all.nt /output /users/hastimal/ /users/hastimal/query.txt index 2

In this, my jar following are arguments of class.

cc /inputData/data_all.txt /output /users/hastimal/ /users/hastimal/query.txt index 2

0 讨论(0)