How to write a table to hive from spark without using the warehouse connector in HDP 3.1

后端 未结 3 578
萌比男神i
萌比男神i 2021-01-07 10:55

when trying to use spark 2.3 on HDP 3.1 to write to a Hive table without the warehouse connector directly into hives schema using:

spark-shell --driver-memor         


        
相关标签:
3条回答
  • 2021-01-07 11:10

    Inside Ambari simply disabling the option of creating transactional tables by default solves my problem.

    set to false twice (tez, llap)

    hive.strict.managed.tables = false
    

    and enable manually in each table property if desired (to use a transactional table).

    0 讨论(0)
  • 2021-01-07 11:20

    Creating an external table (as a workaround) seems to be the best option for me. This still involves HWC to register the column metadata or update the partition information.

    Something along these lines:

    val df:DataFrame = ...
    val externalPath = "/warehouse/tablespace/external/hive/my_db.db/my_table"
    import com.hortonworks.hwc.HiveWarehouseSession
    val hive = HiveWarehouseSession.session(spark).build()
    dxx.write.partitionBy("part_col").option("compression", "zlib").mode(SaveMode.Overwrite).orc(externalPath)
    val columns = dxx.drop("part_col").schema.fields.map(field => s"${field.name} ${field.dataType.simpleString}").mkString(", ")
    val ddl =
          s"""
             |CREATE EXTERNAL TABLE my_db.my_table ($columns)
             |PARTITIONED BY (part_col string)
             |STORED AS ORC 
             |Location '$externalPath'
           """.stripMargin
    
    hive.execute(ddl)
    hive.execute(s"MSCK REPAIR TABLE $tablename SYNC PARTITIONS")
    

    Unfortunately, this throws a:

    java.sql.SQLException: The query did not generate a result set!
    

    from HWC

    0 讨论(0)
  • 2021-01-07 11:37

    Did you try

        data.write \
            .mode("append") \
            .insertInto("tableName")
    
    0 讨论(0)
提交回复
热议问题