So I'm writing a BigQuery query and basically just need to be able to check if any of a number of strings are present as elements in one of the columns of the table, where the cared-about column itself contains arrays of strings. Just for context, I'm writing the query as part of a little automated Python job and am using standard SQL.
I couldn't find anything that would explicitly check for array inclusion here: https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators
So I came up with a solution that employs a pretty hacky regex, specifically:
...other query stuff...
WHERE
REGEXP_CONTAINS((LOWER(ARRAY_TO_STRING(column, '-'))), r"({joined_string})")
...where column
is the column I care about in the table, and joined_string
is a long string composed of all the strings I need to check for joined by |
(where |
serves as the regex OR operator).
Does there exist some kind of built-in functionality in BigQuery standard SQL that allows one to do this more sanely?
Below are two examples.
First assuming you have your strings in another table strings
#standardSQL
WITH yourTable AS (
SELECT 1 AS id, ['abc', 'def', 'xyz'] AS column UNION ALL
SELECT 2, ['123', '456', '789'] UNION ALL
SELECT 3, ['135', '246', '369']
),
strings AS (
SELECT 'abc' AS str UNION ALL
SELECT '123' UNION ALL
SELECT '456'
)
SELECT *
FROM yourTable
WHERE (SELECT COUNT(1) FROM UNNEST(column) AS col JOIN strings ON col = str) > 0
You can add below to SELECT
list if you need to see how many strings are matching
(SELECT COUNT(1) FROM UNNEST(column) AS col JOIN strings ON col = str) AS cnt
Second example assumes you have list of strings packed in Array
#standardSQL
WITH yourTable AS (
SELECT 1 AS id, ['abc', 'def', 'xyz'] AS column UNION ALL
SELECT 2, ['123', '456', '789'] UNION ALL
SELECT 3, ['135', '246', '369']
),
strings AS (
SELECT ['abc', 'def', '456'] AS strs
)
SELECT yourTable.*
FROM yourTable, strings
WHERE (SELECT COUNT(1) FROM UNNEST(column) AS col JOIN UNNEST(strs) AS str ON col = str) > 0
Same as in first example - you can add below to SELECT
list to see matches count
(SELECT COUNT(1) FROM UNNEST(column) AS col JOIN UNNEST(strs) AS str ON col = str) AS cnt
来源:https://stackoverflow.com/questions/42774626/bigquery-check-for-array-overlap