Table Partitioned by Timestamp Field

前端 未结 1 1308
遇见更好的自我
遇见更好的自我 2021-02-04 17:26

In order to generate some summary figures we are importing data periodically to Hive. We are currently using a CSV file format and its layout is as follows:

oper         


        
1条回答
  •  南笙
    南笙 (楼主)
    2021-02-04 18:02

    It seems like you are looking for dynamic partitioning, and Hive supports dynamic partition inserts as detailed in this article.

    First, you need to create a temporary table where you will put your flat data with no partition at all. In your case this would be:

    CREATE TABLE 
        flatTable (type string, id int, ts bigint, user string, key string) 
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
    

    Then, you should load your flat data file into this directory:

    LOAD DATA LOCAL INPATH
        '/home/spaeth/tmp/hadoop-billing-data/extracted/testData.csv'
    INTO TABLE flatTable;
    

    At that point you can use the dynamic partition insert. A few things to keep in mind are that you'll need the following properties:

    • hive.exec.dynamic.partition should be set to true because dynamic partition is disabled by default I believe.
    • hive.exec.dynamic.partition.mode should be set to nonstrict because you have a single partition and strict mode enforces that you need one static partition.

    So you can run the following query:

    SET hive.exec.dynamic.partition=true;
    SET hive.exec.dynamic.partition.mode=nonstrict;
    FROM
        flatTable
    INSERT OVERWRITE TABLE
        partitionedTable
    PARTITION(time)
    SELECT
        user, from_unixtime(ts, 'yyyy-MM-dd') AS time
    

    This should spawn 2 MapReduce jobs, and at the end you should see something along the lines of:

    Loading data to table default.partitionedtable partition (time=null)
        Loading partition {time=2013-02-10}
        Loading partition {time=2013-02-11}
        Loading partition {time=2013-02-13}
        Loading partition {time=2013-06-09}
    

    And to verify that your partitions are indeed here:

    $ hadoop fs -ls /user/hive/warehouse/partitionedTable/
    Found 4 items
    drwxr-xr-x   - username supergroup          0 2013-11-25 18:35 /user/hive/warehouse/partitionedTable/time=2013-02-10
    drwxr-xr-x   - username supergroup          0 2013-11-25 18:35 /user/hive/warehouse/partitionedTable/time=2013-02-11
    drwxr-xr-x   - username supergroup          0 2013-11-25 18:35 /user/hive/warehouse/partitionedTable/time=2013-02-13
    drwxr-xr-x   - username supergroup          0 2013-11-25 18:35 /user/hive/warehouse/partitionedTable/time=2013-06-09
    

    Please note that dynamic partitions are only supported since Hive 0.6, so if you have an older version this is probably not going to work.

    0 讨论(0)
提交回复
热议问题