I\'m attempting to access Accumulo 1.6 from an Apache Spark job (written in Java) by using an AccumuloInputFormat
with newAPIHadoopRDD
. In order to
So I discovered the answer to this while writing the question (sorry, reputation seekers). The problem is that CDH5 uses Spark 1.0.0, and that I was running the job via YARN. Apparently, YARN mode does not pay any attention to the executor environment and instead uses the environment variable SPARK_YARN_USER_ENV
to control its environment. So ensuring SPARK_YARN_USER_ENV
contains ACCUMULO_CONF_DIR=/etc/accumulo/conf
works, and makes ACCUMULO_CONF_DIR
visible in the environment at the indicated point in the question's source example.
This difference in how standalone mode and YARN mode work resulted in SPARK-1680, which is reported as fixed in Spark 1.1.0.