How exactly is the value of count(*) determined in BigQuery?

后端 未结 2 1521
[愿得一人]
[愿得一人] 2021-01-13 03:14

I am joining a table of about 70000 rows with a slightly bigger second table through inner join each. Now count(a.business_column) and count(*) give different results. The f

相关标签:
2条回答
  • 2021-01-13 03:54

    UPDATE: For a step by step explanation on how this works, see BigQuery flattens when using field with same name as repeated field instead.


    To answer the title question: COUNT(*) in BigQuery is always accurate.

    The caveat is that in SQL COUNT(*) and COUNT(column) have semantically different meanings - and the sample query can be interpreted in different ways.

    See: http://www.xaprb.com/blog/2009/04/08/the-dangerous-subtleties-of-left-join-and-count-in-sql/

    There they have this sample query:

    select user.userid, count(email.subject)
    from user
       inner join email on user.userid = email.userid
    group by user.userid;
    

    That query turns out to be ambigous, and the article author changes it for a more explicit one, adding this comment:

    But what if that’s not what the author of the query meant? There’s no way to really know. There are several possible intended meanings for the query, and there are several different ways to write the query to express those meanings more clearly. But the original query is ambiguous, for a few reasons. And everyone who reads this query afterwards will end up guessing what the original author meant. “I think I can safely change this to…”


    UPDATE: For a step by step explanation on how this works, see BigQuery flattens when using field with same name as repeated field instead.

    0 讨论(0)
  • 2021-01-13 04:10

    COUNT(*) counts most repeated field in your query, if you want to count full records - use COUNT(0).

    0 讨论(0)
提交回复
热议问题