BigQuery COUNT(DISTINCT value) vs COUNT(value)

后端 未结 2 806
情话喂你
情话喂你 2020-11-29 07:35

I found a glitch/bug in bigquery. We got a table based on Bank Statistic data under the starschema.net:clouddb:bank.Banks_token

If i run the following query:

相关标签:
2条回答
  • 2020-11-29 08:02

    In BigQuery, COUNT DISTINCT is a statistical approximation for all results greater than 1000.

    You can provide an optional second argument to give the threshold at which approximations are used. So if you use COUNT(DISTINCT BankId, 10000) in your example, you should see the exact result (since the actual amount of rows is less than 10000). Note, however, that using a larger threshold can be costly in terms of performance.

    See the complete documentation here: https://developers.google.com/bigquery/docs/query-reference#aggfunctions


    UPDATE 2017:

    With BigQuery #standardSQL COUNT(DISTINCT) is always exact. For approximate results use APPROX_COUNT_DISTINCT(). Why would anyone use approx results? See this article.

    0 讨论(0)
  • 2020-11-29 08:06

    I've used EXACT_COUNT_DISTINCT() as a way to get the exact unique count. It's cleaner and more general than COUNT(DISTINCT value, n > numRows)

    Found here: https://cloud.google.com/bigquery/query-reference#aggfunctions

    0 讨论(0)
提交回复
热议问题