How do I set an environment variable in a YARN Spark job?

前端未结

关注

 1  642

刺人心 2020-12-18 10:26

I\'m attempting to access Accumulo 1.6 from an Apache Spark job (written in Java) by using an AccumuloInputFormat with newAPIHadoopRDD. In order to

1条回答

隐瞒了意图╮ (楼主)

2020-12-18 10:46

So I discovered the answer to this while writing the question (sorry, reputation seekers). The problem is that CDH5 uses Spark 1.0.0, and that I was running the job via YARN. Apparently, YARN mode does not pay any attention to the executor environment and instead uses the environment variable SPARK_YARN_USER_ENV to control its environment. So ensuring SPARK_YARN_USER_ENV contains ACCUMULO_CONF_DIR=/etc/accumulo/conf works, and makes ACCUMULO_CONF_DIR visible in the environment at the indicated point in the question's source example.

This difference in how standalone mode and YARN mode work resulted in SPARK-1680, which is reported as fixed in Spark 1.1.0.

0 讨论(0)
发布评论:

提交评论
- 加载中...