how clustering works in BigQuery

孤街浪徒 提交于 2021-02-16 15:32:07

问题


I have a a table UNITARCHIVE partitionned by date, and clustered by UNIT, DUID.

the total size of the table 892 Mb.

when I try this query

SELECT * FROM `test-187010.ReportingDataset.UNITARCHIVE` WHERE duid="RRSF1" and unit="DUNIT"

Bigquery tell me, it will process 892 mb, I thought clustering is supposed to reduce the scanned size, I understand when I filter per date, the size is reduced dramatically, but i need the whole date range. is it by design or am I doing something wrong


回答1:


To get the most benefits out of clustering, each partition needs to have a certain amount of data.

For example, if the minimum size of a cluster is 100MB (decided internally by BigQuery), and you have only 100MB of data per day, then querying 100 days will scan 100*100MB - regardless of the clustering strategy.

As an alternative with this amount of data, instead of partitioning by day, partition by year. Then you'll get the most benefits out of clustering with a low amount of data per day.

See Partition by week/year/month to get over the partition limit? for a reference table that shows this off.



来源:https://stackoverflow.com/questions/57966914/how-clustering-works-in-bigquery

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!