问题
I am writing a function in PostgreSQL. It does basically 3 steps:
- Fetch a record from source table.
- check the value from the fetched record in target table, if record is found in target table then update all values of target table with fetched record otherwise insert fetched record to target table.
Instead of doing this looping, if I write single query for insert/update, will it be faster than above mentioned approach? How can I achieve same result by writing single query instead looping through every records and doing updation/insertion.
My current approach is as below
CREATE OR REPLACE FUNCTION fun1()
RETURNS void AS
$BODY$DECLARE
source_tab_row RECORD;
v_col1 TEXT;
v_col2 TEXT;
v_col3 TEXT;
v_col4 double precision ;
cnt integer;
BEGIN
FOR source_tab_row IN (SELECT * FROM source_tab where col5='abc')
LOOP
v_col1=source_tab_row.col1;
v_col2=source_tab_row.col2;
v_col3=source_tab_row.col3;
v_col4=source_tab_row.col4;
select count(*) INTO cnt from dest_tab where col1=v_col1;
if (cnt =0) then
-- If records is not found
INSERT INTO dest_tab(col1, col2, col3,col4)
VALUES( v_col1, v_col2, v_col3,v_col4) ;
else
--if records found then update it
update dest_tab set col1=v_col1, col2=v_col2, col3=v_col3,col4=v_col4
where col1=v_col1;
end if;
END LOOP;
END;
$BODY$ LANGUAGE plpgsql;
回答1:
Better SQL
If you have PostgreSQL 9.1 or later, you should definitely use a data-modifying CTE for this:
WITH x AS (
UPDATE dest_tab d
SET col2 = s.col2
, col3 = s.col3
-- , ...
FROM source_tab s
WHERE s.col5 = 'abc'
AND s.col1 = d.col1
RETURNING col1
)
INSERT INTO dest_tab(col1, col2, col3, col4)
SELECT s.col1, s.col2, s.col3, s.col4
FROM source_tab s
WHERE s.col5 = 'abc'
LEFT JOIN x USING (col1)
WHERE x.col1 IS NULL;
As @Craig already posted, such operations are regularly much faster as set-based SQL than by iterating through individual rows.
However, this form is faster and simpler. It also avoids the inherent (tiny!) race condition to a large extent. To begin with, as this is a single SQL command, the time slot is even shorter. Also, if a concurrent transaction should enter competing rows between the UPDATE
and the INSERT
, you get a duplicate key violation (provided you have a pk / unique constraint as you should). Because you don't query dest_tab
a second time and reuse the original set for the INSERT
. Faster, better.
If you ever get to see a duplicate key violation: nothing bad happened, just retry the query.
It does not cover the opposite case where a concurrent transaction would DELETE
a row in the meantime. This is really the less important / frequent case, IMO.
Proper plpgsql
If you use plpgsql for this, simplify:
CREATE OR REPLACE FUNCTION fun1()
RETURNS void AS
$BODY$
DECLARE
_source source_tab; -- name of table = type
BEGIN
FOR _source IN
SELECT * FROM source_tab where col5 = 'abc'
LOOP
UPDATE dest_tab
SET col2 = _source.col2 -- don't update col1, it doesn't change
,col3 = _source.col3
,col4 = _source.col4
WHERE col1 = _source.col1;
IF NOT FOUND THEN -- no row found
INSERT INTO dest_tab(col1, col2, col3,col4)
VALUES (_source.col1, _source.col2, _source.col3, _source.col4);
END IF;
END LOOP;
END
$BODY$ LANGUAGE plpgsql;
来源:https://stackoverflow.com/questions/13233369/replace-looping-with-a-single-query-for-insert-update