I am trying to setup Apache Hive on Spark on AWS EMR 5.11.0. Apache Spark Version - 2.2.1 Apache Hive Version - 2.3.2 Yarn logs show below error:
18/01/28 21:55:28
I be able run hive on spark by run it like:
HIVE_AUX_JARS_PATH=$(find /usr/lib/spark/jars/ -name '*.jar' -and -not -name '*slf4j-log4j12*' -printf '%p:' | head -c-1) hive
Then, before other SQL queries issue:
SET hive.execution.engine = spark;
Add line
export HIVE_AUX_JARS_PATH=$(find /usr/lib/spark/jars/ -name '*.jar' -and -not -name '*slf4j-log4j12*' -printf '%p:' | head -c-1)
into /home/hadoop/.bashrc
And in file /etc/hive/conf/hive-site.xml
set:
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
EMR Spark supports Hive version 1.2.1 and not the hive 2.x version. Could you please check the hive jar versions available in /usr/lib/spark/jars/ directory. SPARK_RPC_SERVER_ADDRESS is added in hive version 2.x.
The sbt or pom.xml to be like as follows.
"org.apache.spark" %% "spark-streaming" % sparkVersion % "provided",
"org.apache.spark" %% "spark-sql" % sparkVersion % "provided",
"org.apache.spark" %% "spark-hive" % sparkVersion % "provided",
I am running DataWarehouse (Hive) on EMR and spark application stored the data into DWH.
Sorry, but Hive on Spark is not yet supported on EMR. I have not tried it myself yet, but I think the likely cause of your errors might be a mismatch between the version of Spark supported on EMR and the version of Spark upon which Hive depends. The last time I checked, Hive did not support Spark 2.x when running Hive on Spark. Given that your first error is a NoSuchFieldError, it seems like a version mismatch is the most likely cause. The timeout error may be a red herring.