问题
I have a table that looks like this:
I am looking for a table that gives a frequency count of the elements in the fields l_0, l_1, l_2, l_3
.
For example the output should look like this:
| author_id | year | l_o.name | l_0.count| l1.name | l1.count | l2.name | l2.count| l3.name | l3.count|
| 2164089123 | 1987 | biology | 3 | botany | 3 | | | | |
| 2595831531 | 1987 | computer science | 2 | simulation | 2 | computer simulation | 2 | mathematical model | 2 |
Edit:
In some cases the array field might have more than one type of element. For example l_0
could be ['biology', 'biology', 'geometry', 'geometry']
. In that case the output for fields l_0, l_1, l_2, l_3
would be a nested repeated field with all the elements in l_0.name
and all the corresponding counts in the l_0.count
.
回答1:
This should work, assuming you want to count on a per-array basis:
SELECT
author_id,
year,
(SELECT AS STRUCT ANY_VALUE(l_0) AS name, COUNT(*) AS count
FROM UNNEST(l_0) AS l_0) AS l_0,
(SELECT AS STRUCT ANY_VALUE(l_1) AS name, COUNT(*) AS count
FROM UNNEST(l_1) AS l_1) AS l_1,
(SELECT AS STRUCT ANY_VALUE(l_2) AS name, COUNT(*) AS count
FROM UNNEST(l_2) AS l_2) AS l_2,
(SELECT AS STRUCT ANY_VALUE(l_3) AS name, COUNT(*) AS count
FROM UNNEST(l_3) AS l_3) AS l_3
FROM YourTable;
To avoid so much repetition, you can make use of a SQL UDF:
CREATE TEMP FUNCTION GetNameAndCount(elements ARRAY<STRING>) AS (
(SELECT AS STRUCT ANY_VALUE(elem) AS name, COUNT(*) AS count
FROM UNNEST(elements) AS elem)
);
SELECT
author_id,
year,
GetNameAndCount(l_0) AS l_0,
GetNameAndCount(l_1) AS l_1,
GetNameAndCount(l_2) AS l_2,
GetNameAndCount(l_3) AS l_3
FROM YourTable;
If you potentially need to account for multiple different names within an array, you can have the UDF return an array of them with associated counts instead:
CREATE TEMP FUNCTION GetNamesAndCounts(elements ARRAY<STRING>) AS (
ARRAY(
SELECT AS STRUCT elem AS name, COUNT(*) AS count
FROM UNNEST(elements) AS elem
GROUP BY elem
ORDER BY count
)
);
SELECT
author_id,
year,
GetNamesAndCounts(l_0) AS l_0,
GetNamesAndCounts(l_1) AS l_1,
GetNamesAndCounts(l_2) AS l_2,
GetNamesAndCounts(l_3) AS l_3
FROM YourTable;
Note that if you want to perform counting across rows, however, you'll need to unnest the arrays and perform the GROUP BY
at the outer level, but it doesn't look like this is your intention based on the question.
来源:https://stackoverflow.com/questions/48411331/how-to-count-frequency-of-elements-in-a-bigquery-array-field