Nest multiple repeated fields in BigQuery

后端 未结 2 1674
情书的邮戳
情书的邮戳 2021-01-25 20:32

Loading repeated fields in GBQ by importing a JSON file

By importing a JSON file with repeated records in BigQuery, you can create a table with nested repeated fields.

相关标签:
2条回答
  • 2021-01-25 20:54

    With introduction of BigQuery Standard SQL we've got easy way to deal with records
    Try below, Don't forget to uncheck Use Legacy SQL checkbox under Show Options

    WITH YourTable AS (
      SELECT 'a1' AS item,  '2016-03-03 19:52:23 UTC' AS click_time, 'u1' AS userid UNION ALL
      SELECT 'a1' AS item,  '2016-03-03 19:52:23 UTC' AS click_time, 'u2' AS userid UNION ALL
      SELECT 'a1' AS item,  '2016-03-03 19:52:23 UTC' AS click_time, 'u3' AS userid UNION ALL
      SELECT 'a1' AS item,  '2016-03-03 19:52:23 UTC' AS click_time, 'u4' AS userid UNION ALL
      SELECT 'a2' AS item,  '2016-03-03 19:52:23 UTC' AS click_time, 'u1' AS userid UNION ALL
      SELECT 'a2' AS item,  '2016-03-03 19:52:23 UTC' AS click_time, 'u2' AS userid
    )
    SELECT item, ARRAY_AGG(STRUCT(click_time, userid)) AS clicks
    FROM YourTable
    GROUP BY item
    
    0 讨论(0)
  • 2021-01-25 20:59

    Assume you have flatten data in your table :

    item    click_time  userid   
    a1  2016-03-03 19:52:23 UTC u1   
    a1  2016-03-03 19:52:23 UTC u2   
    a1  2016-03-03 19:52:23 UTC u3   
    a1  2016-03-03 19:52:23 UTC u4   
    a2  2016-03-03 19:52:23 UTC u1   
    a2  2016-03-03 19:52:23 UTC u2
    

    Below GBQ Query does what you ask for :
    Please note: you need to write to table with 'Allow Large Result' and 'UnFlatten' options

    SELECT *
    FROM JS( 
      ( // input table 
        SELECT item, NEST(CONCAT(STRING(click_time), ',', STRING(userid))) AS clicks 
        FROM YourTable
        GROUP BY item
      ), 
      item, clicks, // input columns 
      "[ // output schema 
        {'name': 'item', 'type': 'STRING'},
         {'name': 'clicks', 'type': 'RECORD',
         'mode': 'REPEATED',
         'fields': [
           {'name': 'click_time', 'type': 'STRING'},
           {'name': 'userid', 'type': 'STRING'}
           ]    
         }
      ]", 
      "function(row, emit) { // function 
        var c = []; 
        for (var i = 0; i < row.clicks.length; i++) { 
          x = row.clicks[i].split(','); 
          t = {click_time:x[0], 
                userid:x[1]} ;
          c.push(t); 
        }; 
        emit({item: row.item, clicks: c}); 
      }"
    ) 
    

    result is expected as below

    0 讨论(0)
提交回复
热议问题