Split table into multiple tables based on date using bigquery with a single query for partitioning

前端未结

关注

 2  1918

孤城傲影 2021-01-27 01:34

The original \"why\" of what I want to do is:

Restore a table maintaining its original partitioning instead of it all going into today\'s partition.

What I thoug

2条回答

春和景丽 (楼主)

2021-01-27 02:31
The main problem here is having full scan for each and every day. The the rest is less of a problem and can be easily scripted out in any client of your choice

So, below is to - How avoid full table scan for each and every day?

Try below step-by-step to see the approach
It is generic enough to extend/apply to your real case - meantime I am using same example as you in your question and I am limiting exercise to just 10 days

Step 1 – Create Pivot table
In this step we a) compress each row’s content into record/array and b) put them all into respective ”daily” column
```
#standardSQL
SELECT
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160101' THEN r END) AS day20160101,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160102' THEN r END) AS day20160102,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160103' THEN r END) AS day20160103,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160104' THEN r END) AS day20160104,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160105' THEN r END) AS day20160105,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160106' THEN r END) AS day20160106,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160107' THEN r END) AS day20160107,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160108' THEN r END) AS day20160108,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160109' THEN r END) AS day20160109,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160110' THEN r END) AS day20160110
FROM (
  SELECT d, r, ROW_NUMBER() OVER(PARTITION BY d) AS line
  FROM (
    SELECT 
      stn, CONCAT('day', year, mo, da) AS d, ARRAY_AGG(t) AS r
    FROM `bigquery-public-data.noaa_gsod.gsod2016` AS t 
    GROUP BY stn, d
  ) 
)
GROUP BY line
```
Run above query in Web UI with pivot_table (you can choose whatever name you want here) as a destination

As you can see - here we will get table with 10 columns – one column for one day and schema of each column is a copy of schema of original table:

Step 2 – Creating sharded table one-by-one ONLY scanning respective column (no full table scan)
```
#standardSQL
SELECT r.*
FROM pivot_table, UNNEST(day20160101) AS r
```
Run above query from Web UI with destination table named mytable_20160101

You can run same for next day
```
#standardSQL
SELECT r.*
FROM pivot_table, UNNEST(day20160102) AS r
```
Now you should have destination table named mytable_20160102 and so on
You should be able to automate/script this step with any client of your choice Note: those final daily tables will have exactly same schema as original table!

There are many variations of how you can use above approach - it is up to your creativity

Note: BigQuery allows up to 10000 columns in table, so 365 columns for respective days of one year is definitely not a problem here :o)
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...