I need to set a custom environment variable in EMR to be available when running a spark application.
I have tried adding this:
.
Add the custom configurations like below JSON to a file say, custom_config.json
[
{
"Classification": "spark-env",
"Properties": {},
"Configurations": [
{
"Classification": "export",
"Properties": {
"VARIABLE_NAME": VARIABLE_VALUE,
}
}
]
}
]
And, On creating the emr cluster, pass the file reference to the --configurations
option
aws emr create-cluster --configurations file://custom_config.json --other-options...
Use classification yarn-env to pass environment variables to the worker nodes.
Use classification spark-env to pass environment variables to the driver, with deploy mode client. When using deploy mode cluster, use yarn-env.
For me replacing spark-env to yarn-env fixed issue.