I\'m using HiveContext with SparkSQL and I\'m trying to connect to a remote Hive metastore, the only way to set the hive metastore is through including the hive-site.xml on
For Spark 1.x, you can set with :
System.setProperty("hive.metastore.uris", "thrift://METASTORE:9083");
final SparkConf conf = new SparkConf();
SparkContext sc = new SparkContext(conf);
HiveContext hiveContext = new HiveContext(sc);
Or
final SparkConf conf = new SparkConf();
SparkContext sc = new SparkContext(conf);
HiveContext hiveContext = new HiveContext(sc);
hiveContext.setConf("hive.metastore.uris", "thrift://METASTORE:9083");
Update If your Hive is Kerberized :
Try setting these before creating the HiveContext :
System.setProperty("hive.metastore.sasl.enabled", "true");
System.setProperty("hive.security.authorization.enabled", "false");
System.setProperty("hive.metastore.kerberos.principal", hivePrincipal);
System.setProperty("hive.metastore.execute.setugi", "true");
In Hadoop 3 Spark and Hive catalogs are separated so:
For spark-shell (it comes with .enableHiveSupport()
by default) just try:
pyspark-shell --conf spark.hadoop.metastore.catalog.default=hive
For spark-submit job create you spark session like this:
SparkSession.builder.appName("Test").enableHiveSupport().getOrCreate()
then add this conf on your spark-submit command:
--conf spark.hadoop.metastore.catalog.default=hive
But for ORC table(and more generally internal table) it is recommended to use HiveWareHouse Connector.