How can I pass a row from my table to a UDF without specifying the complete type?

泪湿孤枕 提交于 2019-12-08 04:28:10

问题


Let's say that I want to do some processing on a table (such as the sample Github commits) that has a nested structure using a JavaScript UDF. I may want to change the fields that I look at in the UDF as I iterate on its implementation, so I decide just to pass entire rows from the table to it. My UDF ends up looking something like this:

#standardSQL
CREATE TEMP FUNCTION GetCommitStats(
  input STRUCT<commit STRING, tree STRING, parent ARRAY<STRING>,
               author STRUCT<name STRING, email STRING, ...>>)
  RETURNS STRUCT<
    parent ARRAY<STRING>,
    author_name STRING,
    diff_count INT64>
  LANGUAGE js AS """
[UDF content here]
""";

Then I call the function with a query such as:

SELECT GetCommitStats(t).*
FROM `bigquery-public-data.github_repos.sample_commits` AS t;

The most cumbersome part of the UDF declaration is the input struct, since I have to include all of the nested fields and their types. Is there a better way to do this?


回答1:


You can use TO_JSON_STRING to convert arbitrary structs and arrays to JSON, then parse it inside your UDF into an object for further processing. For example,

#standardSQL
CREATE TEMP FUNCTION GetCommitStats(json_str STRING)
  RETURNS STRUCT<
    parent ARRAY<STRING>,
    author_name STRING,
    diff_count INT64>
  LANGUAGE js AS """
var row = JSON.parse(json_str);
var result = new Object();
result['parent'] = row.parent;
result['author_name'] = row.author.name;
result['diff_count'] = row.difference.length;
return result;
""";

SELECT GetCommitStats(TO_JSON_STRING(t)).*
FROM `bigquery-public-data.github_repos.sample_commits` AS t;

If you want to cut down on the number of columns that are scanned, you can pass a struct of the relevant columns to TO_JSON_STRING instead:

#standardSQL
CREATE TEMP FUNCTION GetCommitStats(json_str STRING)
  RETURNS STRUCT<
    parent ARRAY<STRING>,
    author_name STRING,
    diff_count INT64>
  LANGUAGE js AS """
var row = JSON.parse(json_str);
var result = new Object();
result['parent'] = row.parent;
result['author_name'] = row.author.name;
result['diff_count'] = row.difference.length;
return result;
""";

SELECT
  GetCommitStats(TO_JSON_STRING(
    STRUCT(parent, author, difference)
  )).*
FROM `bigquery-public-data.github_repos.sample_commits`;


来源:https://stackoverflow.com/questions/44031988/how-can-i-pass-a-row-from-my-table-to-a-udf-without-specifying-the-complete-type

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!