How to insert/copy one partition's data to multiple partitions in hive?

核能气质少年 提交于 2019-12-10 10:19:52

问题


I'm having data of day='2019-01-01' in my hive table, I want to copy same data to whole Jan-2019 month. (i.e. in '2019-01-02', '2019-01-03'...'2019-01-31')

I'm trying following but data is only inserted in '2019-01-02' and not in '2019-01-03'.

INSERT OVERWRITE TABLE db_t.students PARTITION(dt='2019-01-02', dt='2019-01-03')
SELECT id, name, marks FROM db_t.students WHERE dt='2019-01-01';

回答1:


Cross join all your data with calendar dates for required date range. Use dynamic partitioning:

set hivevar:start_date=2019-01-02; 
set hivevar:end_date=2019-01-31; 

set hive.exec.dynamic.partition=true; 
set hive.exec.dynamic.partition.mode=nonstrict;  

with date_range as 
(--this query generates date range
select date_add ('${hivevar:start_date}',s.i) as dt 
  from ( select posexplode(split(space(datediff('${hivevar:end_date}','${hivevar:start_date}')),' ')) as (i,x) ) s
)

INSERT OVERWRITE TABLE db_t.students PARTITION(dt)
SELECT id, name, marks, r.dt --partition column is the last one
  FROM db_t.students s 
       CROSS JOIN date_range r
 WHERE s.dt='2019-01-01'
DISTRIBUTE BY r.dt;

One more possible solution is to copy partition data using hadoop fs -cp or hadoop distcp (repeat for each partition or use loop in the shell ):

hadoop fs -cp '/usr/warehouse/students/dt=2019-01-01' '/usr/warehouse/students/dt=2019-01-02'

And one more solution using UNION ALL:

    set hive.exec.dynamic.partition=true; 
    set hive.exec.dynamic.partition.mode=nonstrict;      

    INSERT OVERWRITE TABLE db_t.students PARTITION(dt)
    SELECT id, name, marks, '2019-01-02' as dt FROM db_t.students s WHERE s.dt='2019-01-01'
    UNION ALL
     SELECT id, name, marks, '2019-01-03' as dt FROM db_t.students s WHERE s.dt='2019-01-01'
    UNION ALL
     SELECT id, name, marks, '2019-01-04' as dt FROM db_t.students s WHERE s.dt='2019-01-01' 
    UNION ALL
    ... 
  ;


来源:https://stackoverflow.com/questions/56071235/how-to-insert-copy-one-partitions-data-to-multiple-partitions-in-hive

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!