BigQuery filter per the last Date and use Partition

后端 未结 3 1154
清酒与你
清酒与你 2021-01-27 03:27

I asked how to filter the last date and got excellent answers (BigQuery, how to use alias in where clause?), they all work, but, they scan the whole table, the field SETTLEMENT

3条回答
  •  北海茫月
    2021-01-27 04:24

    Mikhail's answer looks like this (working on public data):

    SELECT MAX(views)
    FROM `fh-bigquery.wikipedia_v3.pageviews_2019` 
    WHERE DATE(datehour) = DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY)     
    AND wiki='es' 
    # 122.2 MB processed
    

    But it seems the question wants something like this:

    SELECT MAX(views)
    FROM `fh-bigquery.wikipedia_v3.pageviews_2019` 
    WHERE DATE(datehour) = (SELECT DATE(MAX(datehour)) FROM `fh-bigquery.wikipedia_v3.pageviews_2019` WHERE wiki='es')     
    AND wiki='es'
    # 50.6 GB processed
    

    ... but for way less than 50.6GB

    What you need now is some sort of scripting, to perform this in 2 steps:

    max_date = (SELECT DATE(MAX(datehour)) FROM `fh-bigquery.wikipedia_v3.pageviews_2019` WHERE wiki='es')   
    
    ;
    
    SELECT MAX(views)
    FROM `fh-bigquery.wikipedia_v3.pageviews_2019` 
    WHERE DATE(datehour) = {{max_date}}
    AND wiki='es'
    # 115.2 MB processed
    

    You will have to script this outside BigQuery - or wait for news on https://issuetracker.google.com/issues/36955074.

提交回复
热议问题