Delete/update table entries by joining 2 tables on Google BigQuery without import/export

后端未结

关注

 2  709

盖世英雄少女心 2021-01-14 14:19

We have a usecase where we have hundreds of millions of entries in a table and have a problem splitting it up further. 99% of operations are append-only. However, we have oc

2条回答

傲寒 (楼主)

2021-01-14 14:54
So to add more on my comment:

Why don't you just accept the updates as a new row in your table, and have queries that read only the last row from the table? That's much easier.

Create a view like this:
```
select * from (
SELECT 
rank() over (partition by user_id order by timestamp desc) as _rank,
*
FROM [db.userupdate_last] 
) where _rank=1
```
and update your queries to query the view table and your basic table and you are done.

Some context how we use this. We have an events table that hold user profile data. On every update we append the complete profile data row again in BQ. That means that we end up having a versioned content with as many rows for that user_id as how many updates they have done. This is all in the same table, and by looking at the time we know the order of the updates. Let's say the table us: [userupdate]. If we do a
```
select * from userupdate where user_id=10
```
it will return all updates made by this user to their profile in random order.

But we created a view, which we created only once, and the syntax is above. And now when we:
```
select * from userupdate_last where user_id=10 #notice the table name changed to view name
```
it will return only 1 row, the last row of the user. And we have queries where we just swap the table name to view name, if we want to query from a table holding a bunch of append only rows only the last one.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...