Aggregating distinct values from JSONB arrays combined with SQL group by

余生颓废 提交于 2021-01-29 15:13:11

问题


I am trying to aggregate distinct values from JSONB arrays in a SQL GROUP BY statement:

One dataset has many cfiles and a cfile only ever has one dataset

SELECT * FROM cfiles;
 id | dataset_id |                property_values (jsonb)                
----+------------+-----------------------------------------------
  1 |          1 | {"Sample Names": ["SampA", "SampB", "SampC"]}
  2 |          1 | {"Sample Names": ["SampA", "SampB", "SampD"]}
  3 |          1 | {"Sample Names": ["SampE"]}
  4 |          2 | {"Sample Names": ["SampA", "SampF"]}
  5 |          2 | {"Sample Names": ["SampG"]}

This query works and returns the correct result I want but it's a mess.

SELECT distinct(datasets.id) as dataset_id,
ARRAY_TO_STRING(
  ARRAY(
    SELECT DISTINCT * FROM unnest(
      STRING_TO_ARRAY(
        STRING_AGG(
          DISTINCT REPLACE(
            REPLACE(
              REPLACE(
                REPLACE(
                  cfiles.property_values ->> 'Sample Names', '",' || chr(32) || '"', ';'
                ), '[' , ''
              ), '"' , ''
            ), ']' , ''
          ), ';'
        ), ';'
      )
    ) ORDER BY 1 ASC
  ), '; '
) as sample_names
FROM datasets
JOIN cfiles ON cfiles.dataset_id=datasets.id
GROUP BY datasets.id

 dataset_id |           sample_names            
------------+-----------------------------------
          1 | SampA; SampB; SampC; SampD; SampE
          2 | SampA; SampF; SampG

Is there a better way to write this query without all the string manipulation?

I tired jsonb_array_elements but it gave me the error subquery uses ungrouped column "cfiles.property_values" from outer query. So then I added cfiles.property_values to the GROUP BY but it no longer grouped just by the dataset_id

Not the result I want:

SELECT DISTINCT datasets.id as dataset_id,
ARRAY_TO_STRING(
  ARRAY(
    SELECT DISTINCT * FROM jsonb_array_elements(
      cfiles.property_values -> 'Sample Names'
    ) ORDER BY 1 ASC
  ), '; '
) as sample_names
FROM datasets
JOIN cfiles ON cfiles.dataset_id=datasets.id
GROUP BY datasets.id, cfiles.property_values

 dataset_id |       sample_names        
------------+---------------------------
          1 | "SampA"; "SampB"; "SampC"
          1 | "SampA"; "SampB"; "SampD"
          1 | "SampE"
          2 | "SampA"; "SampF"
          2 | "SampG"

SQL for creating demo

CREATE TABLE datasets (
  id INT PRIMARY KEY
);

CREATE TABLE cfiles (
  id INT PRIMARY KEY,
  dataset_id INT,
  property_values JSONB,
  FOREIGN KEY (dataset_id) REFERENCES datasets(id)
);

INSERT INTO datasets values (1),(2);

INSERT INTO cfiles values
  (1,1,'{"Sample Names":["SampA", "SampB", "SampC"]}'),
  (2,1,'{"Sample Names":["SampA", "SampB", "SampD"]}'),
  (3,1,'{"Sample Names":["SampE"]}');

INSERT INTO cfiles values 
  (4,2,'{"Sample Names":["SampA", "SampF"]}'),
  (5,2,'{"Sample Names":["SampG"]}');

回答1:


jsonb_array_elements is a set returning function and should be used in the FROM clause. Using it in the SELECT list makes things unnecessarily complicated:

select c.dataset_id, string_agg(distinct n.name, '; ' order by n.name)
from cfiles c
  cross join jsonb_array_elements_text(c.property_values -> 'Sample Names') as n(name)
group by c.dataset_id
order by c.dataset_id;  

Online example



来源:https://stackoverflow.com/questions/65850946/aggregating-distinct-values-from-jsonb-arrays-combined-with-sql-group-by

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!