Replace looping with a single query for INSERT / UPDATE

问题

I am writing a function in PostgreSQL. It does basically 3 steps:

Fetch a record from source table.
check the value from the fetched record in target table, if record is found in target table then update all values of target table with fetched record otherwise insert fetched record to target table.

Instead of doing this looping, if I write single query for insert/update, will it be faster than above mentioned approach? How can I achieve same result by writing single query instead looping through every records and doing updation/insertion.

My current approach is as below

CREATE OR REPLACE FUNCTION fun1()
  RETURNS void AS
$BODY$DECLARE
   source_tab_row RECORD;

   v_col1 TEXT;
   v_col2 TEXT;
   v_col3 TEXT;
   v_col4 double precision ;
   cnt integer;

BEGIN
    FOR source_tab_row IN (SELECT * FROM source_tab where col5='abc')
LOOP
    v_col1=source_tab_row.col1;   
    v_col2=source_tab_row.col2;
    v_col3=source_tab_row.col3;
    v_col4=source_tab_row.col4;

    select count(*) INTO cnt from dest_tab where col1=v_col1;

     if (cnt =0) then
     -- If records is not found
       INSERT INTO dest_tab(col1, col2, col3,col4)
       VALUES( v_col1, v_col2, v_col3,v_col4)   ;
    else
     --if records found then update it
       update dest_tab set col1=v_col1, col2=v_col2, col3=v_col3,col4=v_col4
       where col1=v_col1;

     end if;         
END LOOP;
END;
$BODY$ LANGUAGE plpgsql;

回答1:

Better SQL

If you have PostgreSQL 9.1 or later, you should definitely use a data-modifying CTE for this:

WITH x AS (
   UPDATE dest_tab d
   SET    col2 = s.col2
        , col3 = s.col3
   --   , ...
   FROM   source_tab s
   WHERE  s.col5 = 'abc'
   AND    s.col1 = d.col1

   RETURNING col1
   )
INSERT INTO dest_tab(col1, col2, col3, col4)
SELECT s.col1, s.col2, s.col3, s.col4
FROM   source_tab s
WHERE  s.col5 = 'abc'
LEFT   JOIN x USING (col1)
WHERE  x.col1 IS NULL;

As @Craig already posted, such operations are regularly much faster as set-based SQL than by iterating through individual rows.

However, this form is faster and simpler. It also avoids the inherent (tiny!) race condition to a large extent. To begin with, as this is a single SQL command, the time slot is even shorter. Also, if a concurrent transaction should enter competing rows between the UPDATE and the INSERT, you get a duplicate key violation (provided you have a pk / unique constraint as you should). Because you don't query dest_tab a second time and reuse the original set for the INSERT. Faster, better.

If you ever get to see a duplicate key violation: nothing bad happened, just retry the query.

It does not cover the opposite case where a concurrent transaction would DELETE a row in the meantime. This is really the less important / frequent case, IMO.

Proper plpgsql

If you use plpgsql for this, simplify:

CREATE OR REPLACE FUNCTION fun1()
  RETURNS void AS
$BODY$
DECLARE
   _source source_tab;  -- name of table = type
BEGIN
   FOR _source IN
      SELECT * FROM source_tab where col5 = 'abc'
   LOOP
        UPDATE dest_tab
        SET    col2 = _source.col2  -- don't update col1, it doesn't change
               ,col3 = _source.col3
               ,col4 = _source.col4
        WHERE  col1 = _source.col1;

      IF NOT FOUND THEN  -- no row found
         INSERT INTO dest_tab(col1, col2, col3,col4)
         VALUES (_source.col1, _source.col2, _source.col3, _source.col4);
      END IF;

   END LOOP;
END
$BODY$ LANGUAGE plpgsql;

来源：https://stackoverflow.com/questions/13233369/replace-looping-with-a-single-query-for-insert-update

标签

sql

postgresql

plpgsql

upsert

postgresql-performance