Split table into multiple tables based on date using bigquery with a single query for partitioning

前端未结

关注

 2  1919

孤城傲影

The original \"why\" of what I want to do is:

Restore a table maintaining its original partitioning instead of it all going into today\'s partition.

What I thoug

相关标签:

2条回答

陌清茗

2021-01-27 02:15
Answering myself here. Another approach I've seen done is to write a script that:
1. Parses the tablebackup.json file, outputs many files tablebackuppartitionYYYYMMDD.json split on a provided parameter.
2. Creates a batch script to bq load all the files into the appropriate table partitions.
The script would need to process row by row or chunks to be able to handle massive backups. And would take some time. The advantage of using this method is it would be generic and usable by an untrained-in-BQ sysadmin.
0 讨论(0)
发布评论:

提交评论
- 加载中...
春和景丽

2021-01-27 02:31
The main problem here is having full scan for each and every day. The the rest is less of a problem and can be easily scripted out in any client of your choice

So, below is to - How avoid full table scan for each and every day?

Try below step-by-step to see the approach
It is generic enough to extend/apply to your real case - meantime I am using same example as you in your question and I am limiting exercise to just 10 days

Step 1 – Create Pivot table
In this step we a) compress each row’s content into record/array and b) put them all into respective ”daily” column
```
#standardSQL
SELECT
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160101' THEN r END) AS day20160101,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160102' THEN r END) AS day20160102,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160103' THEN r END) AS day20160103,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160104' THEN r END) AS day20160104,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160105' THEN r END) AS day20160105,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160106' THEN r END) AS day20160106,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160107' THEN r END) AS day20160107,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160108' THEN r END) AS day20160108,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160109' THEN r END) AS day20160109,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160110' THEN r END) AS day20160110
FROM (
  SELECT d, r, ROW_NUMBER() OVER(PARTITION BY d) AS line
  FROM (
    SELECT 
      stn, CONCAT('day', year, mo, da) AS d, ARRAY_AGG(t) AS r
    FROM `bigquery-public-data.noaa_gsod.gsod2016` AS t 
    GROUP BY stn, d
  ) 
)
GROUP BY line
```
Run above query in Web UI with pivot_table (you can choose whatever name you want here) as a destination

As you can see - here we will get table with 10 columns – one column for one day and schema of each column is a copy of schema of original table:

Step 2 – Creating sharded table one-by-one ONLY scanning respective column (no full table scan)
```
#standardSQL
SELECT r.*
FROM pivot_table, UNNEST(day20160101) AS r
```
Run above query from Web UI with destination table named mytable_20160101

You can run same for next day
```
#standardSQL
SELECT r.*
FROM pivot_table, UNNEST(day20160102) AS r
```
Now you should have destination table named mytable_20160102 and so on
You should be able to automate/script this step with any client of your choice Note: those final daily tables will have exactly same schema as original table!

There are many variations of how you can use above approach - it is up to your creativity

Note: BigQuery allows up to 10000 columns in table, so 365 columns for respective days of one year is definitely not a problem here :o)
0 讨论(0)
发布评论:

提交评论
- 加载中...