Partition by week/month//quarter/year to get over the partition limit?

后端 未结 2 888
醉梦人生
醉梦人生 2020-11-27 20:37

I have 32 years of data that I want to put into a partitioned table. However BigQuery says that I\'m going over the limit (4000 partitions).

For a query like:

<
2条回答
  •  有刺的猬
    2020-11-27 21:22

    Alternative example, I created a NOAA GSOD summary table clustered by station name - and instead of partitioning by day, I didn't partition it at all.

    Let's say I want to find the hottest days since 1980 for all stations with a name like SAN FRAN%:

    SELECT name, state, ARRAY_AGG(STRUCT(date,temp) ORDER BY temp DESC LIMIT 5) top_hot, MAX(date) active_until
    FROM `fh-bigquery.weather_gsod.all` 
    WHERE name LIKE 'SAN FRANC%'
    AND date > '1980-01-01'
    GROUP BY 1,2
    ORDER BY active_until DESC
    

    Note that I got the results after processing only 55.2MB of data.

    The equivalent query on the source tables (without clustering) processes 4GB instead:

    # query on non-clustered tables - too much data compared to the other one
    SELECT name, state, ARRAY_AGG(STRUCT(CONCAT(a.year,a.mo,a.da),temp) ORDER BY temp DESC LIMIT 5) top_hot, MAX(CONCAT(a.year,a.mo,a.da)) active_until
    FROM `bigquery-public-data.noaa_gsod.gsod*` a
    JOIN `bigquery-public-data.noaa_gsod.stations`  b
    ON a.wban=b.wban AND a.stn=b.usaf
    WHERE name LIKE 'SAN FRANC%'
    AND _table_suffix >= '1980'
    GROUP BY 1,2
    ORDER BY active_until DESC
    

    I also added a geo clustered table, to search by location instead of station name. See details here: https://stackoverflow.com/a/34804655/132438

提交回复
热议问题