The original \"why\" of what I want to do is:
Restore a table maintaining its original partitioning instead of it all going into today\'s partition.
What I thoug
Answering myself here. Another approach I've seen done is to write a script that:
Parses the tablebackup.json
file, outputs many files tablebackuppartitionYYYYMMDD.json
split on a provided parameter.
Creates a batch script to bq load
all the files into the appropriate table partitions.
The script would need to process row by row or chunks to be able to handle massive backups. And would take some time. The advantage of using this method is it would be generic and usable by an untrained-in-BQ sysadmin.
The main problem here is having full scan for each and every day. The the rest is less of a problem and can be easily scripted out in any client of your choice
So, below is to - How avoid full table scan for each and every day?
Try below step-by-step to see the approach
It is generic enough to extend/apply to your real case - meantime I am using same example as you in your question and I am limiting exercise to just 10 days
Step 1 – Create Pivot table
In this step we a) compress each row’s content into record/array and b) put them all into respective ”daily” column
#standardSQL
SELECT
ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160101' THEN r END) AS day20160101,
ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160102' THEN r END) AS day20160102,
ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160103' THEN r END) AS day20160103,
ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160104' THEN r END) AS day20160104,
ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160105' THEN r END) AS day20160105,
ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160106' THEN r END) AS day20160106,
ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160107' THEN r END) AS day20160107,
ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160108' THEN r END) AS day20160108,
ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160109' THEN r END) AS day20160109,
ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160110' THEN r END) AS day20160110
FROM (
SELECT d, r, ROW_NUMBER() OVER(PARTITION BY d) AS line
FROM (
SELECT
stn, CONCAT('day', year, mo, da) AS d, ARRAY_AGG(t) AS r
FROM `bigquery-public-data.noaa_gsod.gsod2016` AS t
GROUP BY stn, d
)
)
GROUP BY line
Run above query in Web UI with pivot_table (you can choose whatever name you want here) as a destination
As you can see - here we will get table with 10 columns – one column for one day and schema of each column is a copy of schema of original table:
Step 2 – Creating sharded table one-by-one ONLY scanning respective column (no full table scan)
#standardSQL
SELECT r.*
FROM pivot_table, UNNEST(day20160101) AS r
Run above query from Web UI with destination table named mytable_20160101
You can run same for next day
#standardSQL
SELECT r.*
FROM pivot_table, UNNEST(day20160102) AS r
Now you should have destination table named mytable_20160102 and so on
You should be able to automate/script this step with any client of your choice
Note: those final daily tables will have exactly same schema as original table!
There are many variations of how you can use above approach - it is up to your creativity
Note: BigQuery allows up to 10000 columns in table, so 365 columns for respective days of one year is definitely not a problem here :o)