I have a Spark (version 1.4.1) application on HDP 2.3. It works fine when running it in YARN-Client mode. However, when running it on YARN-Cluster mode none of my Hive table
I posted this same question on the Hortonworks community, and I resolved the issue with the help of this answer.
The gist of it is this: when submitting the application, the --files
argument has to come before the --jars
argument, and the copy of hive-site.xml
to use is the one in the Spark conf
dir, not in $HIVE_HOME/conf/hive-site.xml
. Hence:
./bin/spark-submit \
--class com.myCompany.Main \
--master yarn-cluster \
--num-executors 3 \
--driver-memory 1g \
--executor-memory 11g \
--executor-cores 1 \
--files /usr/hdp/current/spark-client/conf/hive-site.xml \
--jars lib/datanucleus-api-jdo-3.2.6.jar,lib/datanucleus-rdbms-3.2.9.jar,lib/datanucleus-core-3.2.10.jar \
/home/spark/apps/YarnClusterTest.jar
If you are able to fetch data using Hive CLI, then use the same hive-site.xml in your Spark job.
The only reason could be the location of metastore defined in hive-site.xml.