Nest multiple repeated fields in BigQuery

后端未结

关注

 2  1675

情书的邮戳

Loading repeated fields in GBQ by importing a JSON file

By importing a JSON file with repeated records in BigQuery, you can create a table with nested repeated fields.

相关标签:

2条回答

眼角桃花

2021-01-25 20:54

With introduction of BigQuery Standard SQL we've got easy way to deal with records
Try below, Don't forget to uncheck Use Legacy SQL checkbox under Show Options

WITH YourTable AS (
  SELECT 'a1' AS item,  '2016-03-03 19:52:23 UTC' AS click_time, 'u1' AS userid UNION ALL
  SELECT 'a1' AS item,  '2016-03-03 19:52:23 UTC' AS click_time, 'u2' AS userid UNION ALL
  SELECT 'a1' AS item,  '2016-03-03 19:52:23 UTC' AS click_time, 'u3' AS userid UNION ALL
  SELECT 'a1' AS item,  '2016-03-03 19:52:23 UTC' AS click_time, 'u4' AS userid UNION ALL
  SELECT 'a2' AS item,  '2016-03-03 19:52:23 UTC' AS click_time, 'u1' AS userid UNION ALL
  SELECT 'a2' AS item,  '2016-03-03 19:52:23 UTC' AS click_time, 'u2' AS userid
)
SELECT item, ARRAY_AGG(STRUCT(click_time, userid)) AS clicks
FROM YourTable
GROUP BY item

0 讨论(0)

半阙折子戏

2021-01-25 20:59

Assume you have flatten data in your table :

item    click_time  userid   
a1  2016-03-03 19:52:23 UTC u1   
a1  2016-03-03 19:52:23 UTC u2   
a1  2016-03-03 19:52:23 UTC u3   
a1  2016-03-03 19:52:23 UTC u4   
a2  2016-03-03 19:52:23 UTC u1   
a2  2016-03-03 19:52:23 UTC u2

Below GBQ Query does what you ask for :
Please note: you need to write to table with 'Allow Large Result' and 'UnFlatten' options

SELECT *
FROM JS( 
  ( // input table 
    SELECT item, NEST(CONCAT(STRING(click_time), ',', STRING(userid))) AS clicks 
    FROM YourTable
    GROUP BY item
  ), 
  item, clicks, // input columns 
  "[ // output schema 
    {'name': 'item', 'type': 'STRING'},
     {'name': 'clicks', 'type': 'RECORD',
     'mode': 'REPEATED',
     'fields': [
       {'name': 'click_time', 'type': 'STRING'},
       {'name': 'userid', 'type': 'STRING'}
       ]    
     }
  ]", 
  "function(row, emit) { // function 
    var c = []; 
    for (var i = 0; i < row.clicks.length; i++) { 
      x = row.clicks[i].split(','); 
      t = {click_time:x[0], 
            userid:x[1]} ;
      c.push(t); 
    }; 
    emit({item: row.item, clicks: c}); 
  }"
)

result is expected as below

0 讨论(0)