Bigquery - json_extract all elements from an array

前端 未结 2 1252
攒了一身酷
攒了一身酷 2020-12-16 20:06

i\'m trying to extract two key from every json in an arry of jsons(using sql legacy) currently i am using json extract function :

json_extract(json_column ,          


        
相关标签:
2条回答
  • 2020-12-16 20:23

    Update 2020: JSON_EXTRACT_ARRAY()

    Now BigQuery supports JSON_EXTRACT_ARRAY():

    • https://cloud.google.com/bigquery/docs/reference/standard-sql/json_functions#json_extract_array

    For example, to solve this particular question:

    SELECT id
      , ARRAY(
          SELECT JSON_EXTRACT_SCALAR(x, '$.author.email') 
          FROM UNNEST(JSON_EXTRACT_ARRAY(payload, "$.commits"))x
      ) emails
    FROM `githubarchive.day.20180830` 
    WHERE type='PushEvent' 
    AND id='8188163772'
    


    Previous answer

    Let's start with a similar problem - this is not a very convenient way to extract all emails from a json array:

    SELECT id
      , [ JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[0].author.email')  
          , JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[1].author.email')  
          , JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[2].author.email')  
          , JSON_EXTRACT_SCALAR(JSON_EXTRACT(payload, '$.commits'), '$[3].author.email')
        ] emails
    FROM `githubarchive.day.20180830` 
    WHERE type='PushEvent' 
    AND id='8188163772'
    

    The best way we have right now to deal with this is to use some JavaScript in an UDF to split a json-array into a SQL array:

    CREATE TEMP FUNCTION json2array(json STRING)
    RETURNS ARRAY<STRING>
    LANGUAGE js AS """
      return JSON.parse(json).map(x=>JSON.stringify(x));
    """; 
    
    SELECT * EXCEPT(array_commits),
      ARRAY(SELECT JSON_EXTRACT_SCALAR(x, '$.author.email') FROM UNNEST(array_commits) x) emails
    FROM (
      SELECT id
        , json2array(JSON_EXTRACT(payload, '$.commits')) array_commits
      FROM `githubarchive.day.20180830` 
      WHERE type='PushEvent' 
      AND id='8188163772'
    )
    

    0 讨论(0)
  • 2020-12-16 20:24

    May 1st, 2020 Update

    A new function, JSON_EXTRACT_ARRAY, has been just added to the list of JSON functions. This function allows you to extract the contents of a JSON document as a string array.

    so in below you can replace use of CUSTOM_JSON_EXTRACT UDF with just in-built function JSON_EXTRACT_ARRAY as in below example

    #standardSQL
    SELECT 
      JSON_EXTRACT_SCALAR(json , '$.X') AS X,
      JSON_EXTRACT_SCALAR(json , '$.Y') AS Y
    FROM t, UNNEST(JSON_EXTRACT_ARRAY(json_column , '$')) json   
    

    ==============

    Below example for BigQuery Standard SQL and allows you to be close to standard way of working with JSONPath and no extra manipulation needed so you just simply use CUSTOM_JSON_EXTRACT(json, json_path) function

    #standardSQL
    CREATE TEMPORARY FUNCTION CUSTOM_JSON_EXTRACT(json STRING, json_path STRING)
    RETURNS ARRAY<STRING>
    LANGUAGE js AS """
            return jsonPath(JSON.parse(json), json_path);
    """
    OPTIONS (
        library="gs://your_bucket/jsonpath-0.8.0.js"
    );
    WITH t AS (
    SELECT '''
    [
    {"blabla1":1,"X":1,"blabla2":3,"blabla3":5,"blabla4":7,"Y":"2"},
    {"blabla1":2,"X":3,"blabla2":4,"blabla3":6,"blabla4":8,"Y":"4"}
    ]   
    ''' AS json_column 
    )
    SELECT 
      CUSTOM_JSON_EXTRACT(json_column , '$[*].X') AS X,
      CUSTOM_JSON_EXTRACT(json_column , '$[*].Y') AS Y
    FROM t   
    

    result will be

    Row X   Y    
    1   1   2    
        3   4      
    

    Note: to overcome current BigQuery's "limitation" for JsonPath, above solution uses custom function along with external library - jsonpath-0.8.0.js that can be downloaded from https://code.google.com/archive/p/jsonpath/downloads and uploaded to Google Cloud Storage - gs://your_bucket/jsonpath-0.8.0.js

    Just re-read Felipe's answer - for his example above solution will look like below (just as FYI)

    SELECT 
      id, 
      CUSTOM_JSON_EXTRACT(payload, '$.commits[*].author.email') emails
    FROM `githubarchive.day.20180830` 
    WHERE type='PushEvent' 
    AND id='8188163772'
    
    0 讨论(0)
提交回复
热议问题