How to create AWS Glue table where partitions have different columns? ('HIVE_PARTITION_SCHEMA_MISMATCH')

后端 未结 4 1325
隐瞒了意图╮
隐瞒了意图╮ 2020-12-24 01:50

As per this AWS Forum Thread, does anyone know how to use AWS Glue to create an AWS Athena table whose partitions contain different schemas (in this case different subsets o

相关标签:
4条回答
  • 2020-12-24 02:07

    I had the same issue, solved it by configuring crawler to update table metadata for preexisting partitions:

    0 讨论(0)
  • 2020-12-24 02:12

    Despite selecting Update all new and existing partitions with metadata from the table. in the crawler's configuration, it still occasionally failed to set the expected parameters for all partitions (specifically jsonPath wasn't inherited from the table's properties in my case).

    As suggested in https://docs.aws.amazon.com/athena/latest/ug/updates-and-partitions.html, "to drop the partition that is causing the error and recreate it" helped

    After dropping the problematic partitions, glue crawler re-created them correctly on the following run

    0 讨论(0)
  • 2020-12-24 02:17

    It also fixed my issue! If somebody need to provision This Configuration Crawler with Terraform so here is how I did it:

    resource "aws_glue_crawler" "crawler-s3-rawdata" {
      database_name = "my_glue_database"
      name          = "my_crawler"
      role          = "my_iam_role.arn"
    
      configuration = <<EOF
    {
       "Version": 1.0,
       "CrawlerOutput": {
          "Partitions": { "AddOrUpdateBehavior": "InheritFromTable" }
       }
    }
    EOF
      s3_target {
        path = "s3://mybucket"
      }
    }
    
    0 讨论(0)
  • 2020-12-24 02:29

    This helped me. Posting the image for others in case the link is lost

    0 讨论(0)
提交回复
热议问题