How to connect Spark SQL to remote Hive metastore (via thrift protocol) with no hive-site.xml?

后端 未结 8 1275
我寻月下人不归
我寻月下人不归 2020-11-22 12:07

I\'m using HiveContext with SparkSQL and I\'m trying to connect to a remote Hive metastore, the only way to set the hive metastore is through including the hive-site.xml on

相关标签:
8条回答
  • 2020-11-22 12:39

    Below code worked for me. We can ignore the config of hive.metastore.uris for local metastore, spark will create hive objects in spare-warehouse directory locally.

    import org.apache.spark.sql.SparkSession;
    
    object spark_hive_support1 
    {
      def main (args: Array[String]) 
       {
        val spark = SparkSession
          .builder()
          .master("yarn")
          .appName("Test Hive Support")
          //.config("hive.metastore.uris", "jdbc:mysql://localhost/metastore")
          .enableHiveSupport
          .getOrCreate();
    
        import spark.implicits._
    
        val testdf = Seq(("Word1", 1), ("Word4", 4), ("Word8", 8)).toDF;
        testdf.show;
        testdf.write.mode("overwrite").saveAsTable("WordCount");
      }
    }
    
    0 讨论(0)
  • 2020-11-22 12:39

    Some of the similar questions are marked as duplicate, this is to connect to Hive from Spark without using hive.metastore.uris or separate thrift server(9083) and not copying hive-site.xml to the SPARK_CONF_DIR.

    import org.apache.spark.sql.SparkSession
    val spark = SparkSession
      .builder()
      .appName("hive-check")
      .config(
        "spark.hadoop.javax.jdo.option.ConnectionURL",
        "JDBC_CONNECT_STRING"
      )
      .config(
        "spark.hadoop.javax.jdo.option.ConnectionDriverName",
        "org.postgresql.Driver"
      )
      .config("spark.sql.warehouse.dir", "/user/hive/warehouse")
      .config("spark.hadoop.javax.jdo.option.ConnectionUserName", "JDBC_USER")
      .config("spark.hadoop.javax.jdo.option.ConnectionPassword", "JDBC_PASSWORD")
      .enableHiveSupport()
      .getOrCreate()
    spark.catalog.listDatabases.show(false)
    
    0 讨论(0)
  • 2020-11-22 12:46

    Setting spark.hadoop.metastore.catalog.default=hive worked for me.

    0 讨论(0)
  • 2020-11-22 12:49

    Spark Version : 2.0.2

    Hive Version : 1.2.1

    Below Java code worked for me to connect to Hive metastore from Spark:

    import org.apache.spark.sql.SparkSession;
    
    public class SparkHiveTest {
    
        public static void main(String[] args) {
    
            SparkSession spark = SparkSession
                      .builder()
                      .appName("Java Spark Hive Example")
                      .config("spark.master", "local")
                      .config("hive.metastore.uris",                
                       "thrift://abc123.com:9083")
                      .config("spark.sql.warehouse.dir", "/apps/hive/warehouse")
                      .enableHiveSupport()
                      .getOrCreate();
    
            spark.sql("SELECT * FROM default.survey_data limit 5").show();
        }
    }
    
    0 讨论(0)
  • 2020-11-22 12:56

    I too faced same problem, but resolved. Just follow this steps in Spark 2.0 Version

    Step1: Copy hive-site.xml file from Hive conf folder to spark conf.

    Step 2: edit spark-env.sh file and configure your mysql driver. (If you are using Mysql as a hive metastore.)

    Or add MySQL drivers to Maven/SBT (If using those)

    Step3: When you are creating spark session add enableHiveSupport()

    val spark = SparkSession.builder.master("local").appName("testing").enableHiveSupport().getOrCreate()

    Sample code:

    package sparkSQL
    
    /**
      * Created by venuk on 7/12/16.
      */
    
    import org.apache.spark.sql.SparkSession
    
    object hivetable {
      def main(args: Array[String]): Unit = {
        val spark = SparkSession.builder.master("local[*]").appName("hivetable").enableHiveSupport().getOrCreate()
    
        spark.sql("create table hivetab (name string, age int, location string) row format delimited fields terminated by ',' stored as textfile")
        spark.sql("load data local inpath '/home/hadoop/Desktop/asl' into table hivetab").show()
        val x = spark.sql("select * from hivetab")
        x.write.saveAsTable("hivetab")
      }
    }
    

    Output:

    0 讨论(0)
  • 2020-11-22 13:02

    In spark 2.0.+ it should look something like that:

    Don't forget to replace the "hive.metastore.uris" with yours. This assume that you have a hive metastore service started already (not a hiveserver).

     val spark = SparkSession
              .builder()
              .appName("interfacing spark sql to hive metastore without configuration file")
              .config("hive.metastore.uris", "thrift://localhost:9083") // replace with your hivemetastore service's thrift url
              .enableHiveSupport() // don't forget to enable hive support
              .getOrCreate()
    
            import spark.implicits._
            import spark.sql
            // create an arbitrary frame
            val frame = Seq(("one", 1), ("two", 2), ("three", 3)).toDF("word", "count")
            // see the frame created
            frame.show()
            /**
             * +-----+-----+
             * | word|count|
             * +-----+-----+
             * |  one|    1|
             * |  two|    2|
             * |three|    3|
             * +-----+-----+
             */
            // write the frame
            frame.write.mode("overwrite").saveAsTable("t4")
    
    0 讨论(0)
提交回复
热议问题