Return only the newest rows from a BigQuery table with a duplicate items

前端 未结 2 1548
余生分开走
余生分开走 2020-12-17 01:59

I have a table with many duplicate items – Many rows with the same id, perhaps with the only difference being a requested_at column.

I\'d l

相关标签:
2条回答
  • 2020-12-17 02:14

    I suggest a similar form that avoids a sort in the window function:

    SELECT *
        FROM (
          SELECT
              *,
              MAX(<timestamp_column>)
                  OVER (PARTITION BY <id_column>)
                  AS max_timestamp,
          FROM <table>
        )
        WHERE <timestamp_column> = max_timestamp
    
    0 讨论(0)
  • 2020-12-17 02:16

    Try something like this:

        SELECT *
        FROM (
          SELECT
              *,
              ROW_NUMBER()
                  OVER (
                      PARTITION BY <id_column>
                      ORDER BY <timestamp column> DESC)
                  row_number,
          FROM <table>
        )
        WHERE row_number = 1
    

    Note it will add a row_number column, which you might not want. To fix this, you can select individual columns by name in the outer select statement.

    In your case, it sounds like the requested_at column is the one you want to use in the ORDER BY.

    And, you will also want to use allow_large_results, set a destination table, and specify no flattening of results (if you have a schema with repeated fields).

    0 讨论(0)
提交回复
热议问题