How to cross join unnest a JSON array in Presto

后端 未结 3 1855
孤独总比滥情好
孤独总比滥情好 2021-01-04 13:33

Given a table that contains a column of JSON like this:

{\"payload\":[{\"type\":\"b\",\"value\":\"9\"}, {\"type\":\"a\",\"value\":\"8\"}]}
{\"payload\":[{\"t         


        
相关标签:
3条回答
  • 2021-01-04 14:03

    Here's an example of that

    with example(message) as (
    VALUES
    (json '{"payload":[{"type":"b","value":"9"},{"type":"a","value":"8"}]}'),
    (json '{"payload":[{"type":"c","value":"7"}, {"type":"b","value":"3"}]}')
    )
    
    
    SELECT
            n.type,
            avg(n.value)
    FROM example
    CROSS JOIN
        UNNEST(
                CAST(
                    JSON_EXTRACT(message,'$.payload')
                        as ARRAY(ROW(type VARCHAR, value INTEGER))
                        )
                    ) as x(n)
    WHERE n.type = 'b'
    GROUP BY n.type
    

    with defines a common table expression (CTE) named example with a column aliased as message

    VALUES returns a verbatim table rowset

    UNNEST is taking an array within a column of a single row and returning the elements of the array as multiple rows.

    CAST is changing the JSON type into an ARRAY type that is required for UNNEST. It could easily have been an ARRAY<MAP< but I find ARRAY(ROW( nicer as you can specify column names, and use dot notation in the select clause.

    JSON_EXTRACT is using a jsonPath expression to return the array value of the payload key

    avg() and group by should be familiar SQL.

    0 讨论(0)
  • 2021-01-04 14:13

    The problem was that I was running an old version of Presto.

    unnest was added in version 0.79

    https://github.com/facebook/presto/blob/50081273a9e8c4d7b9d851425211c71bfaf8a34e/presto-docs/src/main/sphinx/release/release-0.79.rst

    0 讨论(0)
  • 2021-01-04 14:21

    As you pointed out, this was finally implemented in Presto 0.79. :)

    Here is an example of the syntax for the cast from here:

    select cast(cast ('[1,2,3]' as json) as array<bigint>);
    

    Special word of advice, there is no 'string' type in Presto like there is in Hive. That means if your array contains strings make sure you use type 'varchar' otherwise you get an error msg saying 'type array does not exist' which can be misleading.

    select cast(cast ('["1","2","3"]' as json) as array<varchar>);
    
    0 讨论(0)
提交回复
热议问题