How to update all columns with INSERT … ON CONFLICT …?

99封情书 提交于 2020-01-21 06:27:17

问题


I have a table with a single primary key. When I attempt to do an insert there may be a conflict caused by trying to insert a row with an existing key. I want to allow the insert to update all columns? Is there any easy syntax for this? I am trying to let it "upsert" all columns.

I am using PostgreSQL 9.5.5.


回答1:


The UPDATE syntax requires to explicitly name target columns. Possible reasons to avoid that:

  • You have many columns and just want to shorten the syntax.
  • You do not know column names except for the unique column(s).

"All columns" has to mean "all columns of the target table" (or at least "leading columns of the table") in matching order and matching data type. Else you'd have to provide a list of target column names anyway.

Test table:

CREATE TABLE tbl (
   id    int PRIMARY KEY
 , text  text
 , extra text
);

INSERT INTO tbl AS t
VALUES (1, 'foo')
     , (2, 'bar');

1. DELETE & INSERT in single query instead

Without knowing any column names except id.

Only works for "all columns of the target table". While the syntax even works for a leading subset, excess columns in the target table would be reset to NULL with DELETE and INSERT.

UPSERT (INSERT ... ON CONFLICT ...) is needed to avoid concurrency / locking issues under concurrent write load, and only because there is no general way to lock not-yet-existing rows in Postgres (value locking).

Your special requirement only affects the UPDATE part. Possible complications do not apply where existing rows are affected. Those are locked properly. Simplifying some more, you can reduce your case to DELETE and INSERT:

WITH data(id) AS (              -- Only 1st column gets explicit name!
   VALUES
      (1, 'foo_upd', 'a')       -- changed
    , (2, 'bar', 'b')           -- unchanged
    , (3, 'baz', 'c')           -- new
   )
, del AS (
   DELETE FROM tbl AS t
   USING  data d
   WHERE  t.id = d.id
   -- AND    t <> d              -- optional, to avoid empty updates
   )                             -- only works for complete rows
INSERT INTO tbl AS t
TABLE  data                      -- short for: SELECT * FROM data
ON     CONFLICT (id) DO NOTHING
RETURNING t.id;

In the Postgres MVCC model, an UPDATE is largely the same as DELETE and INSERT anyway (except for some corner cases with concurrency, HOT updates, and big column values stored out of line). Since you want to replace all rows anyway, just remove conflicting rows before the INSERT. Deleted rows remain locked until the transaction is committed. The INSERT might only find conflicting rows for previously non-existing key values if a concurrent transaction happens to insert them concurrently (after the DELETE, but before the INSERT).

You would lose additional column values for affected rows in this special case. No exception raised. But if competing queries have equal priority, that's hardly a problem: the other query won for some rows. Also, if the other query is a similar UPSERT, its alternative is to wait for this transaction to commit and then updates right away. "Winning" could be a Pyrrhic victory.

About "empty updates":

  • How do I (or can I) SELECT DISTINCT on multiple columns?

No, my query must win!

OK, you asked for it:

WITH data(id) AS (                   -- Only 1st column gets explicit name!
   VALUES                            -- rest gets default names "column2", etc.
   (1, 'foo_upd', NULL)              -- changed
 , (2, 'bar', NULL)                  -- unchanged
 , (3, 'baz', NULL)                  -- new
 , (4, 'baz', NULL)                  -- new
   )
 , ups AS (
   INSERT INTO tbl AS t
   TABLE  data                       -- short for: SELECT * FROM data
   ON     CONFLICT (id) DO UPDATE
   SET    id = t.id
   WHERE  false                      -- never executed, but locks the row!
   RETURNING t.id
   )
 , del AS (
   DELETE FROM tbl AS t
   USING  data     d
   LEFT   JOIN ups u USING (id)
   WHERE  u.id IS NULL               -- not inserted !
   AND    t.id = d.id
   -- AND    t <> d                  -- avoid empty updates - only for full rows
   RETURNING t.id
   )
 , ins AS (
   INSERT INTO tbl AS t
   SELECT *
   FROM   data
   JOIN   del USING (id)             -- conflict impossible!
   RETURNING id
   )
SELECT ARRAY(TABLE ups) AS inserted  -- with UPSERT
     , ARRAY(TABLE ins) AS updated   -- with DELETE & INSERT;

How?

  • The 1st CTE data just provides data. Could be a table instead.
  • The 2nd CTE ups: UPSERT. Rows with conflicting id are not changed, but also locked.
  • The 3rd CTE del deletes conflicting rows. They remain locked.
  • The 4th CTE ins inserts whole rows. Only allowed for the same transaction
  • The final SELECT is only for the demo to show what happened.

To check for empty updates test (before and after) with:

SELECT ctid, * FROM tbl; -- did the ctid change?

2. Dynamic SQL

This works for a subset of leading columns too, preserving existing values.

The trick is to let Postgres build the query string with column names from the system catalogs dynamically, and then execute it.

See related answers for code:

  • Update multiple columns in a trigger function in plpgsql

  • Bulk update of all columns

  • SQL update fields of one table from fields of another one




回答2:


As I lack the reputation to comment: Erwin Brandstetter's answer seems to fail when the id column is not the first column.

The following uses a snippet from one of his other answers to reproduce the 'return ins/ups' functionality in my case:

DO
$do$
BEGIN
EXECUTE (
SELECT
'DROP TABLE IF EXISTS res_tbl; CREATE TABLE res_tbl AS
WITH 
    ins AS (
       INSERT INTO dest
       TABLE  src                             -- short for: SELECT * FROM data
       ON     CONFLICT (id) DO UPDATE
       SET    id = dest.id
       WHERE  false                             -- never executed, but locks the row!
       RETURNING id
    ),
    repl AS (
        UPDATE dest
        SET   (' || string_agg(quote_ident(column_name), ',') || ')
         = (' || string_agg('src.' || quote_ident(column_name), ',') || ')
        FROM   src
        WHERE  src.id = dest.id
        AND src <> dest
        -- ^ avoids empty updates - only for full-row updates where all columns are comparable (e.g. jsonb not json)
        RETURNING dest.id
    )
SELECT ARRAY(TABLE ins) AS inserted  -- with UPSERT
     , ARRAY(TABLE repl) AS updated  -- with DYNAMIC UPDATE
;'
FROM   information_schema.columns
WHERE  table_name   = 'src'      -- table name, case sensitive
AND    table_schema = 'public'       -- schema name, case sensitive
AND    column_name <> 'id'      -- all columns except id)
);
END
$do$;


来源:https://stackoverflow.com/questions/40687267/how-to-update-all-columns-with-insert-on-conflict

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!