Joining arrays within group by clause

后端 未结 2 384
心在旅途
心在旅途 2021-01-05 11:13

We have a problem grouping arrays into a single array. We want to join the values from two columns into one single array and aggregate these arrays of multiple rows.

相关标签:
2条回答
  • 2021-01-05 11:40

    UNION ALL

    You could "counter-pivot" with UNION ALL first:

    SELECT name, array_agg(c) AS c_arr
    FROM  (
       SELECT name, id, 1 AS rnk, col1 AS c FROM tbl
       UNION ALL
       SELECT name, id, 2, col2 FROM tbl
       ORDER  BY name, id, rnk
       ) sub
    GROUP  BY 1;
    

    Adapted to produce the order of values you later requested. The manual:

    The aggregate functions array_agg, json_agg, string_agg, and xmlagg, as well as similar user-defined aggregate functions, produce meaningfully different result values depending on the order of the input values. This ordering is unspecified by default, but can be controlled by writing an ORDER BY clause within the aggregate call, as shown in Section 4.2.7. Alternatively, supplying the input values from a sorted subquery will usually work.

    Bold emphasis mine.

    LATERAL subquery with VALUES expression

    LATERAL requires Postgres 9.3 or later.

    SELECT t.name, array_agg(c) AS c_arr
    FROM  (SELECT * FROM tbl ORDER BY name, id) t
    CROSS  JOIN LATERAL (VALUES (t.col1), (t.col2)) v(c)
    GROUP  BY 1;
    

    Same result. Only needs a single pass over the table.

    Custom aggregate function

    Or you could create a custom aggregate function like discussed in these related answers:

    • Selecting data into a Postgres array
    • Is there something like a zip() function in PostgreSQL that combines two arrays?
    CREATE AGGREGATE array_agg_mult (anyarray)  (
        SFUNC     = array_cat
      , STYPE     = anyarray
      , INITCOND  = '{}'
    );
    

    Then you can:

    SELECT name, array_agg_mult(ARRAY[col1, col2] ORDER BY id) AS c_arr
    FROM   tbl
    GROUP  BY 1
    ORDER  BY 1;
    

    Or, typically faster, while not standard SQL:

    SELECT name, array_agg_mult(ARRAY[col1, col2]) AS c_arr
    FROM  (SELECT * FROM tbl ORDER BY name, id) t
    GROUP  BY 1;
    

    The added ORDER BY id (which can be appended to such aggregate functions) guarantees your desired result:

    a | {1,2,3,4}
    b | {5,6,7,8}
    

    Or you might be interested in this alternative:

    SELECT name, array_agg_mult(ARRAY[ARRAY[col1, col2]] ORDER BY id) AS c_arr
    FROM   tbl
    GROUP  BY 1
    ORDER  BY 1;
    

    Which produces 2-dimensional arrays:

    a | {{1,2},{3,4}}
    b | {{5,6},{7,8}}
    

    The last one can be replaced (and should be, as it's faster!) with the built-in array_agg() in Postgres 9.5 or later - with its added capability of aggregating arrays:

    SELECT name, array_agg(ARRAY[col1, col2] ORDER BY id) AS c_arr
    FROM   tbl
    GROUP  BY 1
    ORDER  BY 1;
    

    Same result. The manual:

    input arrays concatenated into array of one higher dimension (inputs must all have same dimensionality, and cannot be empty or null)

    So not exactly the same as our custom aggregate function array_agg_mult();

    0 讨论(0)
  • 2021-01-05 11:51
    select n, array_agg(c) as c
    from (
        select n, unnest(array[c1, c2]) as c
        from t
    ) s
    group by n
    

    Or simpler

    select
        n,
        array_agg(c1) || array_agg(c2) as c
    from t
    group by n
    

    To address the new ordering requirement:

    select n, array_agg(c order by id, o) as c
    from (
        select
            id, n,
            unnest(array[c1, c2]) as c,
            unnest(array[1, 2]) as o
        from t
    ) s
    group by n
    
    0 讨论(0)
提交回复
热议问题