Hive doesn't read partitioned parquet files generated by Spark

后端 未结 2 617
离开以前
离开以前 2020-12-14 23:28

I\'m having a problem to read partitioned parquet files generated by Spark in Hive. I\'m able to create the external table in hive but when I try to select a few lines, hive

相关标签:
2条回答
  • 2020-12-14 23:53

    I finally found the problem. When you create tables in Hive, where partitioned data already exists in S3 or HDFS, you need to run a command to update the Hive Metastore with the table's partition structure. Take a look here: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)

    The commands are:
    
    MSCK REPAIR TABLE table_name;
    
    
    And on Hive running in Amazon EMR you can use:
    
    ALTER TABLE table_name RECOVER PARTITIONS;
    
    0 讨论(0)
  • 2020-12-15 00:09

    Even though this Question was answered already, the following point may also help the users who are still not able to solve the issue just by MSCK REPAIR TABLE table_name;

    I have an hdfs file system which is partitioned as below:

    <parquet_file>/<partition1>/<partition2>

    eg: my_file.pq/column_5=test/column_6=5

    I created a hive table with partitions

    eg:

    CREATE EXTERNAL TABLE myschema.my_table(
    `column_1` int,
    `column_2` string,
    `column_3` string,
    `column_4` string
    )
    PARTITIONED BY (`column_5` string, `column_6` int) STORED AS PARQUET
    LOCATION
      'hdfs://u/users/iamr/my_file.pq'
    

    After this, I repaired the schema partitions using the following command

    MSCK REPAIR TABLE myschema.my_table;

    After this it was started working for me.

    Another thing I noticed was that, while writing PARQUET files from spark, name the columns with lower case, otherwise hive may not able to map it. For me after renaming the columns in PARQUET file, it started working

    for eg: my_file.pq/COLUMN_5=test/COLUMN_6=5 didn't worked for me

    but my_file.pq/column_5=test/column_6=5 worked

    0 讨论(0)
提交回复
热议问题