In order to generate some summary figures we are importing data periodically to Hive. We are currently using a CSV file format and its layout is as follows:
oper
It seems like you are looking for dynamic partitioning, and Hive supports dynamic partition inserts as detailed in this article.
First, you need to create a temporary table where you will put your flat data with no partition at all. In your case this would be:
CREATE TABLE
flatTable (type string, id int, ts bigint, user string, key string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Then, you should load your flat data file into this directory:
LOAD DATA LOCAL INPATH
'/home/spaeth/tmp/hadoop-billing-data/extracted/testData.csv'
INTO TABLE flatTable;
At that point you can use the dynamic partition insert. A few things to keep in mind are that you'll need the following properties:
hive.exec.dynamic.partition
should be set to true
because dynamic partition is disabled by default I believe.hive.exec.dynamic.partition.mode
should be set to nonstrict
because you have a single partition and strict mode enforces that you need one static partition.So you can run the following query:
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
FROM
flatTable
INSERT OVERWRITE TABLE
partitionedTable
PARTITION(time)
SELECT
user, from_unixtime(ts, 'yyyy-MM-dd') AS time
This should spawn 2 MapReduce jobs, and at the end you should see something along the lines of:
Loading data to table default.partitionedtable partition (time=null)
Loading partition {time=2013-02-10}
Loading partition {time=2013-02-11}
Loading partition {time=2013-02-13}
Loading partition {time=2013-06-09}
And to verify that your partitions are indeed here:
$ hadoop fs -ls /user/hive/warehouse/partitionedTable/
Found 4 items
drwxr-xr-x - username supergroup 0 2013-11-25 18:35 /user/hive/warehouse/partitionedTable/time=2013-02-10
drwxr-xr-x - username supergroup 0 2013-11-25 18:35 /user/hive/warehouse/partitionedTable/time=2013-02-11
drwxr-xr-x - username supergroup 0 2013-11-25 18:35 /user/hive/warehouse/partitionedTable/time=2013-02-13
drwxr-xr-x - username supergroup 0 2013-11-25 18:35 /user/hive/warehouse/partitionedTable/time=2013-06-09
Please note that dynamic partitions are only supported since Hive 0.6, so if you have an older version this is probably not going to work.