How to choose the latest partition in BigQuery table?

后端 未结 7 932
暗喜
暗喜 2020-12-09 21:21

I am trying to select data from the latest partition in a date-partitioned BigQuery table, but the query still reads data from the whole table.

I\'ve tried (as far a

7条回答
  •  有刺的猬
    2020-12-09 22:08

    Sorry for digging up this old question, but it came up in a Google search and I think the accepted answer is misleading.

    As far as I can tell from the documentation and running tests, the accepted answer will not prune partitions because a subquery is used to determine the most recent partition:

    Complex queries that require the evaluation of multiple stages of a query in order to resolve the predicate (such as inner queries or subqueries) will not prune partitions from the query.

    So, although the suggested answer will deliver the results you expect, it will still query all partitions. It will not ignore all older partitions and only query the latest.

    The trick is to use a more-or-less-constant to compare to, instead of a subquery. For example, if _PARTITIONTIME isn't irregular but daily, try pruning partitions by getting yesterdays partition like so:

    SELECT * FROM [dataset.partitioned_table]
        WHERE _PARTITIONDATE = DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
    

    Sure, this isn't always the latest data, but in my case this happens to be close enough. Use INTERVAL 0 DAY if you want todays data, and don't care that the query will return 0 results for the part of the day where the partition hasn't been created yet.

    I'm happy to learn if there is a better workaround to get the latest partition!

提交回复
热议问题