How to find the most recent partition in HIVE table

后端 未结 4 1107
清酒与你
清酒与你 2020-12-31 05:41

I have a partitioned table - with 201 partitions. I need to find latest partition in this table and use it to post process my data. The query to find list of all partitions

相关标签:
4条回答
  • 2020-12-31 06:10

    If you want to avoid running the "show partitions" in hive shell as suggested above, you can apply a filter to your max() query. That will avoid doing a fulltable scan and results should be fairly quick!

    select max(ingest_date) from db.table_name where ingest_date>date_add(current_date,-3) will only scan 2-3 partitions.

    0 讨论(0)
  • 2020-12-31 06:21

    if you know your table location in hdfs. This is the most quick way without even opening the hive shell.

    You can check you table location in hdfs using command;

    show create table <table_name>
    

    then

    hdfs dfs -ls <table_path>| sort -k6,7 | tail -1
    

    It will show latest partition location in hdfs

    0 讨论(0)
  • 2020-12-31 06:29

    It looks like there is no way to query for the last partition via Hive (or beeline) CLI that checks only metadata (as one should expect).

    For the sake of completeness, the alternative I would propose to the bash parsing answer is the one directly querying the metastore, which can be easily extended to more complex functions of the ingest_date rather than just taking the max. For instance, for a MySQL metastore I've used:

    SELECT MAX(PARTITIONS.PART_NAME) FROM
    DBS
    INNER JOIN
    TBLS ON DBS.DB_ID = TBLS.DB_ID
    INNER JOIN
    PARTITIONS ON TBLS.TBL_ID = PARTITIONS.TBL_ID
    PARTITIONS DBS.NAME = 'db'
    PARTITIONS TBLS.TBL_NAME = 'my_table'
    

    Then the output will be in format partition_name=partition_value.

    0 讨论(0)
  • You can use "show partitions":

    hive -e "set hive.cli.print.header=false;show partitions table_name;" | tail -1 | cut -d'=' -f2
    

    This will give you "2016-03-09" as output.

    0 讨论(0)
提交回复
热议问题