Merge a table and a change log into a view in PostgreSQL

后端 未结 1 1201
予麋鹿
予麋鹿 2020-12-06 08:31

My PostgreSQL database contains a table to store instances of a registered entity. This table is populated via spreadsheet upload. A web interface allows an operator to modi

相关标签:
1条回答
  • 2020-12-06 09:27

    Assuming Postgres 9.1 or later.
    I simplified / optimized your basic query to retrieve the latest values:

    SELECT DISTINCT ON (1,2)
           c.unique_id, a.attname AS col, c.value
    FROM   pg_attribute a
    LEFT   JOIN changes c ON c.column_name = a.attname
                         AND c.table_name  = 'instances'
                     --  AND c.unique_id   = 3  -- uncomment to fetch single row
    WHERE  a.attrelid = 'instances'::regclass   -- schema-qualify to be clear?
    AND    a.attnum > 0                         -- no system columns
    AND    NOT a.attisdropped                   -- no deleted columns
    ORDER  BY 1, 2, c.updated_at DESC;
    

    I query the PostgreSQL catalog instead of the standard information schema because that is faster. Note the special cast to ::regclass.

    Now, that gives you a table. You want all values for one unique_id in a row.
    To achieve that you have basically three options:

    1. One subselect (or join) per column. Expensive and unwieldy. But a valid option for only a few columns.

    2. A big CASE statement.

    3. A pivot function. PostgreSQL provides the crosstab() function in the additional module tablefunc for that.
      Basic instructions:

      • PostgreSQL Crosstab Query

    Basic pivot table with crosstab()

    I completely rewrote the function:

    SELECT *
    FROM   crosstab(
        $x$
        SELECT DISTINCT ON (1, 2)
               unique_id, column_name, value
        FROM   changes
        WHERE  table_name = 'instances'
     -- AND    unique_id = 3  -- un-comment to fetch single row
        ORDER  BY 1, 2, updated_at DESC;
        $x$,
    
        $y$
        SELECT attname
        FROM   pg_catalog.pg_attribute
        WHERE  attrelid = 'instances'::regclass  -- possibly schema-qualify table name
        AND    attnum > 0
        AND    NOT attisdropped
        AND    attname <> 'unique_id'
        ORDER  BY attnum
        $y$
        )
    AS tbl (
     unique_id integer
    -- !!! You have to list all columns in order here !!! --
    );
    

    I separated the catalog lookup from the value query, as the crosstab() function with two parameters provides column names separately. Missing values (no entry in changes) are substituted with NULL automatically. A perfect match for this use case!

    Assuming that attname matches column_name. Excluding unique_id, which plays a special role.

    Full automation

    Addressing your comment: There is a way to supply the column definition list automatically. It's not for the faint of heart, though.

    I use a number of advanced Postgres features here: crosstab(), plpgsql function with dynamic SQL, composite type handling, advanced dollar quoting, catalog lookup, aggregate function, window function, object identifier type, ...

    Test environment:

    CREATE TABLE instances (
      unique_id int
    , col1      text
    , col2      text -- two columns are enough for the demo
    );
    
    INSERT INTO instances VALUES
      (1, 'foo1', 'bar1')
    , (2, 'foo2', 'bar2')
    , (3, 'foo3', 'bar3')
    , (4, 'foo4', 'bar4');
    
    CREATE TABLE changes (
      unique_id   int
    , table_name  text
    , column_name text
    , value       text
    , updated_at  timestamp
    );
    
    INSERT INTO changes VALUES
      (1, 'instances', 'col1', 'foo11', '2012-04-12 00:01')
    , (1, 'instances', 'col1', 'foo12', '2012-04-12 00:02')
    , (1, 'instances', 'col1', 'foo1x', '2012-04-12 00:03')
    , (1, 'instances', 'col2', 'bar11', '2012-04-12 00:11')
    , (1, 'instances', 'col2', 'bar17', '2012-04-12 00:12')
    , (1, 'instances', 'col2', 'bar1x', '2012-04-12 00:13')
    
    , (2, 'instances', 'col1', 'foo2x', '2012-04-12 00:01')
    , (2, 'instances', 'col2', 'bar2x', '2012-04-12 00:13')
    
     -- NO change for col1 of row 3 - to test NULLs
    , (3, 'instances', 'col2', 'bar3x', '2012-04-12 00:13');
    
     -- NO changes at all for row 4 - to test NULLs
    

    Automated function for one table

    CREATE OR REPLACE FUNCTION f_curr_instance(int, OUT t public.instances) AS
    $func$
    BEGIN
       EXECUTE $f$
       SELECT *
       FROM   crosstab($x$
          SELECT DISTINCT ON (1,2)
                 unique_id, column_name, value
          FROM   changes
          WHERE  table_name = 'instances'
          AND    unique_id =  $f$ || $1 || $f$
          ORDER  BY 1, 2, updated_at DESC;
          $x$
        , $y$
          SELECT attname
          FROM   pg_catalog.pg_attribute
          WHERE  attrelid = 'public.instances'::regclass
          AND    attnum > 0
          AND    NOT attisdropped
          AND    attname <> 'unique_id'
          ORDER  BY attnum
          $y$) AS tbl ($f$
       || (SELECT string_agg(attname || ' ' || atttypid::regtype::text
                           , ', ' ORDER BY attnum) -- must be in order
           FROM   pg_catalog.pg_attribute
           WHERE  attrelid = 'public.instances'::regclass
           AND    attnum > 0
           AND    NOT attisdropped)
       || ')'
       INTO t;
    END
    $func$  LANGUAGE plpgsql;
    

    The table instances is hard-coded, schema qualified to be unambiguous. Note the use of the table type as return type. There is a row type registered automatically for every table in PostgreSQL. This is bound to match the return type of the crosstab() function.

    This binds the function to the type of the table:

    • You will get an error message if you try to DROP the table
    • Your function will fail after an ALTER TABLE. You have to recreate it (without changes). I consider this a bug in 9.1. ALTER TABLE shouldn't silently break the function, but raise an error.

    This performs very well.

    Call:

    SELECT * FROM f_curr_instance(3);
    
    unique_id | col1  | col2
    ----------+-------+-----
     3        |<NULL> | bar3x
    

    Note how col1 is NULL here.
    Use in a query to display an instance with its latest values:

    SELECT i.unique_id
         , COALESCE(c.col1, i.col1)
         , COALESCE(c.col2, i.col2)
    FROM   instances i
    LEFT   JOIN f_curr_instance(3) c USING (unique_id)
    WHERE  i.unique_id = 3;
    

    Full automation for any table

    (Added 2016. This is dynamite.)
    Requires Postgres 9.1 or later. (Could be made out to work with pg 8.4, but I didn't bother to backpatch.)

    CREATE OR REPLACE FUNCTION f_curr_instance(_id int, INOUT _t ANYELEMENT) AS
    $func$
    DECLARE
       _type text := pg_typeof(_t);
    BEGIN
       EXECUTE
       (
       SELECT format
             ($f$
             SELECT *
             FROM   crosstab(
                $x$
                SELECT DISTINCT ON (1,2)
                       unique_id, column_name, value
                FROM   changes
                WHERE  table_name = %1$L
                AND    unique_id  = %2$s
                ORDER  BY 1, 2, updated_at DESC;
                $x$    
              , $y$
                SELECT attname
                FROM   pg_catalog.pg_attribute
                WHERE  attrelid = %1$L::regclass
                AND    attnum > 0
                AND    NOT attisdropped
                AND    attname <> 'unique_id'
                ORDER  BY attnum
                $y$) AS ct (%3$s)
             $f$
              , _type, _id
              , string_agg(attname || ' ' || atttypid::regtype::text
                         , ', ' ORDER BY attnum)  -- must be in order
             )
       FROM   pg_catalog.pg_attribute
       WHERE  attrelid = _type::regclass
       AND    attnum > 0
       AND    NOT attisdropped
       )
       INTO _t;
    END
    $func$  LANGUAGE plpgsql;
    

    Call (providing the table type with NULL::public.instances:

    SELECT * FROM f_curr_instance(3, NULL::public.instances);
    

    Related:

    • Refactor a PL/pgSQL function to return the output of various SELECT queries
    • How to set value of composite variable field using dynamic SQL
    0 讨论(0)
提交回复
热议问题