I have two different Google Spreadsheet:
One with 4 columns
+------+------+------+------+
| Col1 | Col2 | Col5 | Col6 |
+------+------+------+------+
| ID1 | A | B | C |
| ID2 | D | E | F |
+------+------+------+------+
One with the 4 columns of the previous file, and 2 more columns
+------+------+------+------+------+------+
| Col1 | Col2 | Col3 | Col4 | Col5 | Col6 |
+------+------+------+------+------+------+
| ID3 | G | H | J | K | L |
| ID4 | M | N | O | P | Q |
+------+------+------+------+------+------+
I configured them as Federated source in Google BigQuery, now I need to create a view that will join data of both tables.
Both tables have Col1
column, which contains an ID, this ID is unique across alla the tables, does not contain replicated data.
The resulting table I'm looking for is the following one:
+------+------+------+------+------+------+
| Col1 | Col2 | Col3 | Col4 | Col5 | Col6 |
+------+------+------+------+------+------+
| ID1 | A | NULL | NULL | B | C |
| ID2 | D | NULL | NULL | E | F |
| ID3 | G | H | J | K | L |
| ID4 | M | N | O | P | Q |
+------+------+------+------+------+------+
For the columns that the first file does not have, I'm expecting a NULL
value.
I'm using standardSQL, here is a statement you can use to generate a sample data:
#standardsQL
WITH table1 AS (
SELECT "A" as Col1, "B" as Col2, "C" AS Col3
UNION ALL
SELECT "D" as Col1, "E" as Col2, "F" AS Col3
),
table2 AS (
SELECT "G" as Col1, "H" as Col2, "J" AS Col3, "K" AS Col4, "L" AS Col5
UNION ALL
SELECT "M" as Col1, "N" as Col2, "O" AS Col3, "P" AS Col4, "Q" AS Col5
)
A simple UNION ALL
is not working because tables have different columns
SELECT * FROM table1
UNION ALL
SELECT * FROM table2
Error: Queries in UNION ALL have mismatched column count; query 1 has 3 columns, query 2 has 5 columns at [17:1]
And wildcard operator is not a suitable way because Federated sources does not support that
SELECT * FROM `table*`
Error: External tables cannot be queried through prefix
Of course this is a sample data, with only 3-5 columns, the real tables have 20-40 columns. So an example where I need to explicitly SELECT
field by field it is not a considerable way.
Is there a working way to join this two tables?
Is there a working way to join this two tables?
#standardsQL
SELECT *, NULL AS Col5, NULL AS Col6 FROM table1
UNION ALL
SELECT * FROM table2
Yo can check this using your example
#standardsQL
WITH table1 AS (
SELECT "ID1" AS Col1, "A" AS Col2, "B" AS Col3, "C" AS Col4
UNION ALL
SELECT "ID2", "D", "E", "F"
),
table2 AS (
SELECT "ID3" Col1, "G" AS Col2, "H" AS Col3, "J" AS Col4, "K" AS Col5, "L" AS Col6
UNION ALL
SELECT "ID4", "M", "N", "O", "P", "Q"
)
SELECT *, NULL AS Col5, NULL AS Col6 FROM table1
UNION ALL
SELECT * FROM table2
You can pass the rows through a UDF to handle the case where column names aren't aligned by position or there are different numbers of them between tables. Here is an example:
CREATE TEMP FUNCTION CoerceRow(json_row STRING)
RETURNS STRUCT<Col1 STRING, Col2 STRING, Col3 STRING, Col4 STRING, Col5 STRING>
LANGUAGE js AS """
return JSON.parse(json_row);
""";
WITH table1 AS (
SELECT "A" as Col5, "B" as Col3, "C" AS Col2
UNION ALL
SELECT "D" as Col5, "E" as Col3, "F" AS Col2
),
table2 AS (
SELECT "G" as Col1, "H" as Col2, "J" AS Col3, "K" AS Col4, "L" AS Col5
UNION ALL
SELECT "M" as Col1, "N" as Col2, "O" AS Col3, "P" AS Col4, "Q" AS Col5
)
SELECT CoerceRow(json_row).*
FROM (
SELECT TO_JSON_STRING(t1) AS json_row
FROM table1 AS t1
UNION ALL
SELECT TO_JSON_STRING(t2) AS json_row
FROM table2 AS t2
);
+------+------+------+------+------+
| Col1 | Col2 | Col3 | Col4 | Col5 |
+------+------+------+------+------+
| NULL | C | B | NULL | A |
| NULL | F | E | NULL | D |
| G | H | J | K | L |
| M | N | O | P | Q |
+------+------+------+------+------+
Note that the CoerceRow
function needs to declare the explicit row type that you want in the output. Outside of that, the columns in the tables being unioned are just matched by name.
来源:https://stackoverflow.com/questions/48192021/bigquery-union-two-different-tables-which-are-based-on-federated-google-spreads