Hive: Filtering Data between Specified Dates when Date is a String

后端 未结 6 1411
旧巷少年郎
旧巷少年郎 2021-02-04 02:08

I\'m trying to filter data between September 1st, 2010 and August 31st, 2013 in a Hive table. The column containing the date is in string format (yyyy-mm-dd). I can use month()

相关标签:
6条回答
  • 2021-02-04 02:49

    Hive has a lot of good date parsing UDFs: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions

    Just doing the string comparison as Nigel Tufnel suggests is probably the easiest solution, although technically it's unsafe. But you probably don't need to worry about that unless your tables have historical data about the medieval ages (dates with only 3 year digits) or dates from scifi novels (dates with more than 4 year digits).

    Anyway, if you ever find yourself in a situation where you would want to do fancier date comparisons, or if your date format is not in a "biggest to smallest" order, e.g. the American convention of "mm/dd/yyyy", then you could use unix_timestamp with two arguments:

    select *
    from your_table
    where unix_timestamp(your_date_column, 'yyyy-MM-dd') >= unix_timestamp('2010-09-01', 'yyyy-MM-dd')
    and unix_timestamp(your_date_column, 'yyyy-MM-dd') <= unix_timestamp('2013-08-31', 'yyyy-MM-dd')
    
    0 讨论(0)
  • 2021-02-04 02:56

    The great thing about yyyy-mm-dd date format is that there is no need to extract month() and year(), you can do comparisons directly on strings:

    SELECT *
      FROM your_table
      WHERE your_date_column >= '2010-09-01' AND your_date_column <= '2013-08-31';
    
    0 讨论(0)
  • 2021-02-04 02:58

    Just like SQL, Hive supports BETWEEN operator for more concise statement:

    SELECT *
      FROM your_table
      WHERE your_date_column BETWEEN '2010-09-01' AND '2013-08-31';
    
    0 讨论(0)
  • 2021-02-04 03:03

    You have to convert string formate to required date format as following and then you can get your required result.

    hive> select * from salesdata01 where from_unixtime(unix_timestamp(Order_date, 'dd-MM-yyyy'),'yyyy-MM-dd') >= from_unixtime(unix_timestamp('2010-09-01', 'yyyy-MM-dd'),'yyyy-MM-dd') and from_unixtime(unix_timestamp(Order_date, 'dd-MM-yyyy'),'yyyy-MM-dd') <= from_unixtime(unix_timestamp('2011-09-01', 'yyyy-MM-dd'),'yyyy-MM-dd') limit 10;
    OK
    1   3   13-10-2010  Low 6.0 261.54  0.04    Regular Air -213.25 38.94
    80  483 10-07-2011  High    30.0    4965.7593   0.08    Regular Air 1198.97 195.99
    97  613 17-06-2011  High    12.0    93.54   0.03    Regular Air -54.04  7.3
    98  613 17-06-2011  High    22.0    905.08  0.09    Regular Air 127.7   42.76
    103 643 24-03-2011  High    21.0    2781.82 0.07    Express Air -695.26 138.14
    127 807 23-11-2010  Medium  45.0    196.85  0.01    Regular Air -166.85 4.28
    128 807 23-11-2010  Medium  32.0    124.56  0.04    Regular Air -14.33  3.95
    160 995 30-05-2011  Medium  46.0    1815.49 0.03    Regular Air 782.91  39.89
    229 1539    09-03-2011  Low 33.0    511.83  0.1 Regular Air -172.88 15.99
    230 1539    09-03-2011  Low 38.0    184.99  0.05    Regular Air -144.55 4.89
    Time taken: 0.166 seconds, Fetched: 10 row(s)
    hive> select * from salesdata01 where from_unixtime(unix_timestamp(Order_date, 'dd-MM-yyyy'),'yyyy-MM-dd') >= from_unixtime(unix_timestamp('2010-09-01', 'yyyy-MM-dd'),'yyyy-MM-dd') and from_unixtime(unix_timestamp(Order_date, 'dd-MM-yyyy'),'yyyy-MM-dd') <= from_unixtime(unix_timestamp('2010-12-01', 'yyyy-MM-dd'),'yyyy-MM-dd') limit 10;
    OK
    1   3   13-10-2010  Low 6.0 261.54  0.04    Regular Air -213.25 38.94
    127 807 23-11-2010  Medium  45.0    196.85  0.01    Regular Air -166.85 4.28
    128 807 23-11-2010  Medium  32.0    124.56  0.04    Regular Air -14.33  3.95
    256 1792    08-11-2010  Low 28.0    370.48  0.04    Regular Air -5.45   13.48
    381 2631    23-09-2010  Low 27.0    1078.49 0.08    Regular Air 252.66  40.96
    656 4612    19-09-2010  Medium  9.0 89.55   0.06    Regular Air -375.64 4.48
    769 5506    07-11-2010  Critical    22.0    129.62  0.05    Regular Air 4.41    5.88
    1457    10499   16-11-2010  Not Specified   29.0    6250.936    0.01    Delivery Truck  31.21   262.11
    1654    11911   10-11-2010  Critical    25.0    397.84  0.0 Regular Air -14.75  15.22
    2323    16741   30-09-2010  Medium  6.0 157.97  0.01    Regular Air -42.38  22.84
    Time taken: 0.17 seconds, Fetched: 10 row(s)
    
    0 讨论(0)
  • 2021-02-04 03:04

    No need to extract the month and year.Just need to use the unix_timestamp(date String,format String) function.

    For Example:

    select yourdate_column
    from your_table
    where unix_timestamp(yourdate_column, 'yyyy-MM-dd') >= unix_timestamp('2014-06-02', 'yyyy-MM-dd')
    and unix_timestamp(yourdate_column, 'yyyy-MM-dd') <= unix_timestamp('2014-07-02','yyyy-MM-dd')
    order by yourdate_column limit 10; 
    
    0 讨论(0)
  • 2021-02-04 03:13

    Try this:

    select * from your_table
    where date >= '2020-10-01'  
    
    0 讨论(0)
提交回复
热议问题