问题
Let's say that I want to do some processing on a table (such as the sample Github commits) that has a nested structure using a JavaScript UDF. I may want to change the fields that I look at in the UDF as I iterate on its implementation, so I decide just to pass entire rows from the table to it. My UDF ends up looking something like this:
#standardSQL
CREATE TEMP FUNCTION GetCommitStats(
input STRUCT<commit STRING, tree STRING, parent ARRAY<STRING>,
author STRUCT<name STRING, email STRING, ...>>)
RETURNS STRUCT<
parent ARRAY<STRING>,
author_name STRING,
diff_count INT64>
LANGUAGE js AS """
[UDF content here]
""";
Then I call the function with a query such as:
SELECT GetCommitStats(t).*
FROM `bigquery-public-data.github_repos.sample_commits` AS t;
The most cumbersome part of the UDF declaration is the input struct, since I have to include all of the nested fields and their types. Is there a better way to do this?
回答1:
You can use TO_JSON_STRING to convert arbitrary structs and arrays to JSON, then parse it inside your UDF into an object for further processing. For example,
#standardSQL
CREATE TEMP FUNCTION GetCommitStats(json_str STRING)
RETURNS STRUCT<
parent ARRAY<STRING>,
author_name STRING,
diff_count INT64>
LANGUAGE js AS """
var row = JSON.parse(json_str);
var result = new Object();
result['parent'] = row.parent;
result['author_name'] = row.author.name;
result['diff_count'] = row.difference.length;
return result;
""";
SELECT GetCommitStats(TO_JSON_STRING(t)).*
FROM `bigquery-public-data.github_repos.sample_commits` AS t;
If you want to cut down on the number of columns that are scanned, you can pass a struct of the relevant columns to TO_JSON_STRING
instead:
#standardSQL
CREATE TEMP FUNCTION GetCommitStats(json_str STRING)
RETURNS STRUCT<
parent ARRAY<STRING>,
author_name STRING,
diff_count INT64>
LANGUAGE js AS """
var row = JSON.parse(json_str);
var result = new Object();
result['parent'] = row.parent;
result['author_name'] = row.author.name;
result['diff_count'] = row.difference.length;
return result;
""";
SELECT
GetCommitStats(TO_JSON_STRING(
STRUCT(parent, author, difference)
)).*
FROM `bigquery-public-data.github_repos.sample_commits`;
来源:https://stackoverflow.com/questions/44031988/how-can-i-pass-a-row-from-my-table-to-a-udf-without-specifying-the-complete-type