Dropping a range of partitions in HIVE

折月煮酒 提交于 2020-01-10 19:34:07

问题


I have a Hive (ver 0.11.0) table partitioned by column date, of type string. I want to know if there exists a way in Hive by which I can drop partitions for a range of dates (say from 'date1' to 'date2'). I have tried the following (SQL type) queries, but they don't seem to be syntactically correct:

ALTER TABLE myTable DROP IF EXISTS PARTITION
(date>='date1' and date<='date2');

ALTER TABLE myTable DROP IF EXISTS PARTITION
(date>='date1' && date<='date2');

ALTER TABLE myTable DROP IF EXISTS PARTITION
(date between 'date1' and 'date2');

回答1:


I don't think there is any valid solution to date. I implemented a workaround for this issue using some shell scripts, like for instance:

for y in {2011..2014} 
do 
  for m in {01..12}
  do 
    echo -n "ALTER TABLE reporting.frontend DROP IF EXISTS PARTITION (year=0000,month=00,day=00,hour=00)" 
    for d in {01..31}
    do 
      for h in {01..23}
      do 
        echo -n ", PARTITION (year=$y,month=$m,day=$d,hour=$h)" 
      done
    done
    echo ";"
  done
done > drop_partitions_v1.hql

The resulting .hql file can be simply executed by using the hive (or beeline) -f option.

Obviously the loops should be able to generate the range you want to drop, which might be nontrivial. In the worst case you will need to use several such shell scripts in order to drop the desired range of dates.

Further, please note that in my case the partitions had four keys (year, month, day, hour). If your dates/partitions are coded as strings (not a good idea in my opinion), you will have to 'build' your target string out of the variables y, m, d and h in the shell script, and plot the string inside the echo command. By the way, the dummy partition (containing only 0s) is just there in order to write easily by means of 3-4 loops the whole 'ALTER TABLE' command, which has a special syntax.




回答2:


I tried this syntax it worked.

ALTER TABLE mytable DROP PARTITION (dates>'2018-04-14',dates<'2018-04-16');

Command output:

    Dropped the partition dates=2018-04-15/country_id=107
    Dropped the partition dates=2018-04-15/country_id=110
    Dropped the partition dates=2018-04-15/country_id=112
    Dropped the partition dates=2018-04-15/country_id=14
    Dropped the partition dates=2018-04-15/country_id=157
    Dropped the partition dates=2018-04-15/country_id=159
    Dropped the partition dates=2018-04-15/country_id=177
    Dropped the partition dates=2018-04-15/country_id=208
    Dropped the partition dates=2018-04-15/country_id=22
    Dropped the partition dates=2018-04-15/country_id=233
    Dropped the partition dates=2018-04-15/country_id=234
    Dropped the partition dates=2018-04-15/country_id=76
    Dropped the partition dates=2018-04-15/country_id=83
    OK
    Time taken: 0.706 seconds

I am using, Hive 1.2.1000.2.5.5.0-157




回答3:


Solution: alter table myTable drop partition (unix_timestamp('date1','yyyy-MM-dd')>unix_timestamp(myDate,‌​'yyyy-MM-dd'),unix_t‌​imestamp('date2','yy‌​yy-MM-dd')<unix_time‌​stamp(myDate,'yyyy-M‌​M-dd'));



来源:https://stackoverflow.com/questions/33171704/dropping-a-range-of-partitions-in-hive

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!