How to connect Spark SQL to remote Hive metastore (via thrift protocol) with no hive-site.xml?

后端 未结 8 1276
我寻月下人不归
我寻月下人不归 2020-11-22 12:07

I\'m using HiveContext with SparkSQL and I\'m trying to connect to a remote Hive metastore, the only way to set the hive metastore is through including the hive-site.xml on

相关标签:
8条回答
  • 2020-11-22 13:03

    For Spark 1.x, you can set with :

    System.setProperty("hive.metastore.uris", "thrift://METASTORE:9083");
    
    final SparkConf conf = new SparkConf();
    SparkContext sc = new SparkContext(conf);
    HiveContext hiveContext = new HiveContext(sc);
    

    Or

    final SparkConf conf = new SparkConf();
    SparkContext sc = new SparkContext(conf);
    HiveContext hiveContext = new HiveContext(sc);
    hiveContext.setConf("hive.metastore.uris", "thrift://METASTORE:9083");
    

    Update If your Hive is Kerberized :

    Try setting these before creating the HiveContext :

    System.setProperty("hive.metastore.sasl.enabled", "true");
    System.setProperty("hive.security.authorization.enabled", "false");
    System.setProperty("hive.metastore.kerberos.principal", hivePrincipal);
    System.setProperty("hive.metastore.execute.setugi", "true");
    
    0 讨论(0)
  • 2020-11-22 13:06

    In Hadoop 3 Spark and Hive catalogs are separated so:

    For spark-shell (it comes with .enableHiveSupport() by default) just try:

    pyspark-shell --conf spark.hadoop.metastore.catalog.default=hive
    

    For spark-submit job create you spark session like this:

    SparkSession.builder.appName("Test").enableHiveSupport().getOrCreate()
    

    then add this conf on your spark-submit command:

    --conf spark.hadoop.metastore.catalog.default=hive
    

    But for ORC table(and more generally internal table) it is recommended to use HiveWareHouse Connector.

    0 讨论(0)
提交回复
热议问题