howto add hive properties at runtime in spark-shell

后端 未结 1 1289
无人共我
无人共我 2021-01-05 20:58

How do you set a hive property like: hive.metastore.warehouse.dir at runtime? Or at least a more dynamic way of setting a property like the above, than putting

1条回答
  •  栀梦
    栀梦 (楼主)
    2021-01-05 21:24

    I faced the same issue and for me it worked by setting Hive properties from Spark (2.4.0). Please find below all the options through spark-shell, spark-submit and SparkConf.

    Option 1 (spark-shell)

    spark-shell --conf spark.hadoop.hive.metastore.warehouse.dir=some_path\metastore_db_2
    

    Initially I tried with spark-shell with hive.metastore.warehouse.dir set to some_path\metastore_db_2. Then I get the next warning:

    Warning: Ignoring non-spark config property: hive.metastore.warehouse.dir=C:\winutils\hadoop-2.7.1\bin\metastore_db_2

    Although when I create a Hive table with:

    bigDf.write.mode("overwrite").saveAsTable("big_table")
    

    The Hive metadata are stored correctly under metastore_db_2 folder.

    When I use spark.hadoop.hive.metastore.warehouse.dir the warning disappears and the results are still saved in the metastore_db_2 directory.

    Option 2 (spark-submit)

    In order to use hive.metastore.warehouse.dir when submitting a job with spark-submit I followed the next steps.

    First I wrote some code to save some random data with Hive:

    import org.apache.spark.SparkConf
    import org.apache.spark.sql.SparkSession
    
    val sparkConf = new SparkConf().setAppName("metastore_test").setMaster("local")
    val spark = SparkSession.builder().config(sparkConf).getOrCreate()
    
    import spark.implicits._
    var dfA = spark.createDataset(Seq(
          (1, "val1", "p1"),
          (2, "val1", "p2"),
          (3, "val2", "p3"),
          (3, "val3", "p4"))).toDF("id", "value", "p")
    
    dfA.write.mode("overwrite").saveAsTable("metastore_test")
    
    spark.sql("select * from metastore_test").show(false)
    

    Next I submitted the job with:

    spark-submit --class org.tests.Main \
            --conf spark.hadoop.hive.metastore.warehouse.dir=C:\winutils\hadoop-2.7.1\bin\metastore_db_2 
            spark-scala-test_2.11-0.1.jar 
    

    The metastore_test table was properly created under the C:\winutils\hadoop-2.7.1\bin\metastore_db_2 folder.

    Option 3 (SparkConf)

    Via SparkSession in the Spark code.

    val sparkConf = new SparkConf()
          .setAppName("metastore_test")
          .set("spark.hadoop.hive.metastore.warehouse.dir", "C:\\winutils\\hadoop-2.7.1\\bin\\metastore_db_2")
          .setMaster("local")
    

    This attempt was successful as well.

    The question which still remains is why I have to extend the property with spark.hadoop in order to work as expected?

    0 讨论(0)
提交回复
热议问题