BigQuery flattens when using field with same name as repeated field

前端 未结 3 628
Happy的楠姐
Happy的楠姐 2021-01-26 22:10

Edited to use public dataset

I have a table with the following schema, which you can access here: https://bigquery.cloud.google.com/table/reals

3条回答
  •  臣服心动
    2021-01-26 23:07

    I'm adding a new answer, as you keep adding elements to the question - they all deserve a different answer.

    You say this query surprises you:

    SELECT COUNT(*), COUNT(0)
    FROM (
      SELECT dr_id, cover_photos.is_published
      FROM [realself-main:rs_public.test_count] )
    

    You are surprised because the results are 7 and 3.

    Maybe it will make sense if I try this:

    SELECT COUNT(*), COUNT(0), 
           GROUP_CONCAT(STRING(cover_photos.is_published)),
           GROUP_CONCAT(STRING(dr_id)), 
           GROUP_CONCAT(IFNULL(STRING(cover_photos.is_published),'null')),
           GROUP_CONCAT("0")
    FROM (
      SELECT dr_id, cover_photos.is_published
      FROM [realself-main:rs_public.test_count] 
    )
    

    See? It's the same query, plus 4 different aggregations of the same sub-columns, one of which consists of nested repeated data, and that also has a null value in one row.

    The results of the query are:

    7   3   1,1,1,0,0,0 1234,4321,9999  null,1,1,1,0,0,0    0,0,0
    

    The 7 comes from the full expansion of the nested data into 7 rows, as the 5th column hints. The 3 comes from just evaluating "0" three times, as can be seen on the 6th column.

    These subtleties are all related to working with nested repeated data. I'll advise you to not work with nested repeated data until you are ready to accept that these subtleties can happen when working with nested repeated data.

提交回复
热议问题