Return only the newest rows from a BigQuery table with a duplicate items

前端未结

关注

 2  1548

I have a table with many duplicate items – Many rows with the same id, perhaps with the only difference being a requested_at column.

I\'d l

相关标签:

2条回答

旧巷少年郎

2020-12-17 02:14

I suggest a similar form that avoids a sort in the window function:

SELECT *
    FROM (
      SELECT
          *,
          MAX(<timestamp_column>)
              OVER (PARTITION BY <id_column>)
              AS max_timestamp,
      FROM <table>
    )
    WHERE <timestamp_column> = max_timestamp

0 讨论(0)

面向向阳花

2020-12-17 02:16
Try something like this:
```
    SELECT *
    FROM (
      SELECT
          *,
          ROW_NUMBER()
              OVER (
                  PARTITION BY <id_column>
                  ORDER BY <timestamp column> DESC)
              row_number,
      FROM <table>
    )
    WHERE row_number = 1
```
Note it will add a row_number column, which you might not want. To fix this, you can select individual columns by name in the outer select statement.

In your case, it sounds like the requested_at column is the one you want to use in the ORDER BY.

And, you will also want to use allow_large_results, set a destination table, and specify no flattening of results (if you have a schema with repeated fields).
0 讨论(0)
发布评论:

提交评论
- 加载中...