How can I partition a table with HIVE?

后端 未结 1 969
终归单人心
终归单人心 2021-02-09 13:31

I\'ve been playing with Hive for few days now but I still have a hard time with partition.

I\'ve been recording Apache logs (Combine format) in Hadoop for few months. Th

相关标签:
1条回答
  • 2021-02-09 14:02

    If I understand correctly, you have files in the folders 4 level deep from the directory logs. In that case, you define your table as external with path 'logs' and partitioned by 4 virtual fields: year, month, day_of_month, hour_of_day.

    The partitioning is essentially done for you by Flume.

    EDIT 3/9: A lot of details depends on how exactly Flume writes files. But in general terms, your DDL should look something like this:

    CREATE TABLE table_name(fields...)
    PARTITIONED BY(log_year STRING, log_month STRING, 
        log_day_of_month STRING, log_hour_of_day STRING)
    format description
    STORED AS TEXTFILE
    LOCATION '/your user path/logs';
    

    EDIT 3/15: Per zzarbi request, I'm adding a note that after the table is created, the Hive needs to be informed about partitions created. This needs to be done repeatedly as long as Flume or other process creates new partitions. See my answer to Create external with Partition question.

    0 讨论(0)
提交回复
热议问题