Execute a dynamic crosstab query

前端 未结 2 900
迷失自我
迷失自我 2020-11-27 22:45

I implemented this function in my Postgres database: http://www.cureffi.org/2013/03/19/automatically-creating-pivot-table-column-names-in-postgresql/

Here\'s the fun

相关标签:
2条回答
  • 2020-11-27 22:57

    What you ask for is impossible. SQL is a strictly typed language. PostgreSQL functions need to declare a return type (RETURNS ..) at the time of creation.

    A limited way around this is with polymorphic functions. If you can provide the return type at the time of the function call. But that's not evident from your question.

    • Refactor a PL/pgSQL function to return the output of various SELECT queries

    You can return a completely dynamic result with anonymous records. But then you are required to provide a column definition list with every call. And how do you know about the returned columns? Catch 22.

    There are various workarounds, depending on what you need or can work with. Since all your data columns seem to share the same data type, I suggest to return an array: text[]. Or you could return a document type like hstore or json. Related:

    • Dynamic alternative to pivot with CASE and GROUP BY

    • Dynamically convert hstore keys into columns for an unknown set of keys

    But it might be simpler to just use two calls: 1: Let Postgres build the query. 2: Execute and retrieve returned rows.

    • Selecting multiple max() values using a single SQL statement

    I would not use the function from Eric Minikel as presented in your question at all. It is not safe against SQL injection by way of maliciously malformed identifiers. Use format() to build query strings unless you are running an outdated version older than Postgres 9.1.

    A shorter and cleaner implementation could look like this:

    CREATE OR REPLACE FUNCTION xtab(_tbl regclass, _row text, _cat text
                                  , _expr text  -- still vulnerable to SQL injection!
                                  , _type regtype)
      RETURNS text AS
    $func$
    DECLARE
       _cat_list text;
       _col_list text;
    BEGIN
    
    -- generate categories for xtab param and col definition list    
    EXECUTE format(
     $$SELECT string_agg(quote_literal(x.cat), '), (')
            , string_agg(quote_ident  (x.cat), %L)
       FROM  (SELECT DISTINCT %I AS cat FROM %s ORDER BY 1) x$$
     , ' ' || _type || ', ', _cat, _tbl)
    INTO  _cat_list, _col_list;
    
    -- generate query string
    RETURN format(
    'SELECT * FROM crosstab(
       $q$SELECT %I, %I, %s
          FROM   %I
          GROUP  BY 1, 2  -- only works if the 3rd column is an aggregate expression
          ORDER  BY 1, 2$q$
     , $c$VALUES (%5$s)$c$
       ) ct(%1$I text, %6$s %7$s)'
    , _row, _cat, _expr  -- expr must be an aggregate expression!
    , _tbl, _cat_list, _col_list, _type
    );
    
    END
    $func$ LANGUAGE plpgsql;
    

    Same function call as your original version. The function crosstab() is provided by the additional module tablefunc which has to be installed. Basics:

    • PostgreSQL Crosstab Query

    This handles column and table names safely. Note the use of object identifier types regclass and regtype. Also works for schema-qualified names.

    • Table name as a PostgreSQL function parameter

    However, it is not completely safe while you pass a string to be executed as expression (_expr - cellc in your original query). This kind of input is inherently unsafe against SQL injection and should never be exposed to the general public.

    • SQL injection in Postgres functions vs prepared queries

    Scans the table only once for both lists of categories and should be a bit faster.

    Still can't return completely dynamic row types since that's strictly not possible.

    0 讨论(0)
  • 2020-11-27 23:16

    Not quite impossible, you can still execute it (from a query execute the string and return SETOF RECORD.

    Then you have to specify the return record format. The reason in this case is that the planner needs to know the return format before it can make certain decisions (materialization comes to mind).

    So in this case you would EXECUTE the query, return the rows and return SETOF RECORD.

    For example, we could do something like this with a wrapper function but the same logic could be folded into your function:

    CREATE OR REPLACE FUNCTION crosstab_wrapper
    (tablename varchar, rowc varchar, colc varchar, 
     cellc varchar, celldatatype varchar) 
    returns setof record language plpgsql as $$
        DECLARE outrow record;
        BEGIN
           FOR outrow IN EXECUTE xtab($1, $2, $3, $4, $5)
           LOOP
               RETURN NEXT outrow
           END LOOP;
        END;
     $$;
    

    Then you supply the record structure on calling the function just like you do with crosstab. Then when you all the query you would have to supply a record structure (as (col1 type, col2 type, etc) like you do with connectby.

    0 讨论(0)
提交回复
热议问题