How to get unique values from each column based on a condition?

落花浮王杯 提交于 2019-12-05 21:43:19
Erwin Brandstetter

You can't just return rows, since distinct values don't go together any more.

You could return arrays, which can be had simpler than you may have expected:

SELECT array_agg(DISTINCT c1)  AS c1_arr
      ,array_agg(DISTINCT c2a) AS c2a_arr
      ,array_agg(DISTINCT c2b) AS c2ba_arr
      , ...
FROM   m0301010000_ds;

This returns distinct values per column. One array (possibly big) for each column. All connections between values in columns (what used to be in the same row) are lost in the output.

Build SQL automatically

CREATE OR REPLACE FUNCTION f_build_sql_for_dist_vals(_tbl regclass)
  RETURNS text AS
$func$
SELECT 'SELECT ' || string_agg(format('array_agg(DISTINCT %1$I) AS %1$I_arr'
                                     , attname)
                              , E'\n      ,' ORDER  BY attnum)
        || E'\nFROM   ' || _tbl
FROM   pg_attribute
WHERE  attrelid = _tbl            -- valid, visible table name 
AND    attnum >= 1                -- exclude tableoid & friends
AND    NOT attisdropped           -- exclude dropped columns
$func$  LANGUAGE sql;

Call:

SELECT f_build_sql_for_dist_vals('public.m0301010000_ds');

Returns an SQL string as displayed above.

I use the system catalog pg_attribute instead of the information schema. And the object identifier type regclass for the table name. More explanation in this related answer:
PLpgSQL function to find columns with only NULL values in a given table

If you need this in "real time", you won't be able to archive it using a SQL that needs to do a full table scan to archive it.

I would advise you to create a separated table containing the distinct values for each column (initialized with SQL from @Erwin Brandstetter ;) and maintain it using a trigger on the original table.

Your new table will have one column per field. # of row will be equals to the max number of distinct values for one field.

For on insert: for each field to maintain check if that value is already there or not. If not, add it.

For on update: for each field to maintain that has old value != from new value, check if the new value is already there or not. If not, add it. Regarding the old value, check if any other row has that value, and if not, remove it from the list (set field to null).

For delete : for each field to maintain, check if any other row has that value, and if not, remove it from the list (set value to null).

This way the load mainly moved to the trigger, and the SQL on the value list table will super fast.

P.S.: Make sure to pass all you SQL from trigger to explain plan to make sure they use best index and execution plan as possible. For update/deletion, just check if old value exists (limit 1).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!