overwrite hive partitions using spark

后端 未结 4 1175
北荒
北荒 2021-02-05 21:44

I am working with AWS and I have workflows that use Spark and Hive. My data is partitioned by the date, so everyday I have a new partition in my S3 storage. My problem is when

4条回答
  •  名媛妹妹
    2021-02-05 22:33

    If you are on Spark 2.3.0, try setting spark.sql.sources.partitionOverwriteMode setting to dynamic, the dataset needs to be partitioned, and the write mode overwrite.

    spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")
    data.write.mode("overwrite").insertInto("partitioned_table")
    

提交回复
热议问题