问题
I have stock_price_alert
table with 3 columns. stock_price_id
is PRIMARY KEY
& also FOREIGN KEY
to other table. Table definition as below:
create table stock_price_alert (
stock_price_id integer references stock_price (id) on delete cascade not null,
fall_below_alert boolean not null,
rise_above_alert boolean not null,
primary key (stock_price_id)
);
I need to either:
1) INSERT
record if not exist
-- query 1
INSERT INTO stock_price_alert (stock_price_id, fall_below_alert, rise_above_alert)
VALUES (1, true, false);
2) UPDATE
record if exist
-- query 2
UPDATE stock_price_alert SET
fall_below_alert = true,
rise_above_alert = false
WHERE stock_price_id = 1;
First I need to issue SELECT
query on stock_price_alert
table, in order to decide whether to perform query (1) or (2).
Postgres supports INSERT INTO TABLE .... ON CONFLICT DO UPDATE ...
:
-- query 3
INSERT INTO stock_price_alert (stock_price_id, fall_below_alert, rise_above_alert)
VALUES (1, true, false)
ON CONFLICT (stock_price_id) DO UPDATE SET
fall_below_alert = EXCLUDED.fall_below_alert,
rise_above_alert = EXCLUDED.rise_above_alert;
Instead of using query (1) or (2), can I always use query (3)? Then I don't need to issue SELECT
query in prior & it helps to simplify the code.
But I am wondering, which is the best practice? Will query (3) cause performance issue or unwanted side effect? Thanks.
回答1:
Query 3 is the Postgres syntax for "UPSERT" (= UPDATE or INSERT), introduced in Postgres 9.5.
From the documentation:
ON CONFLICT DO UPDATE
guarantees an atomicINSERT
orUPDATE
outcome; provided there is no independent error, one of those two outcomes is guaranteed, even under high concurrency. This is also known asUPSERT
– “UPDATE
orINSERT
”.
This is the best practice for what you are trying to achieve.
回答2:
I noticed/tested that is much faster for INSERTS (have yet to test UPSERTS) to use a WHERE NOT EXISTS in addition to ON CONFLICT. Typically about 3x faster than just allowing the ON CONFLICT to handle existence checks. I think this may carry over into UPSERTS, making it likely faster to do an INSERT and then and UPDATE. Here is my test for inserts only...
--so i can keep rerunning
DROP TABLE if exists temp1;
DROP TABLE if exists temp2;
--create a billion rows
SELECT GENERATE_SERIES AS id INTO TEMP temp1
FROM GENERATE_SERIES(1, 10000000);
CREATE UNIQUE INDEX ux_id ON temp1(id);
ALTER TABLE temp1 CLUSTER ON ux_id;
--create a second table to insert from, with the same data
SELECT * INTO TEMP temp2
FROM temp1;
CREATE UNIQUE INDEX ux_id2 ON temp2(id);
ALTER TABLE temp2 CLUSTER ON ux_id2;
--test inserting with on conflict only
INSERT INTO temp1(id)
SELECT id
FROM temp2 ON conflict DO nothing;
--execution time: 14.71s (1million rows)
--test inserting with not exists and on conflict
INSERT INTO temp1(id)
SELECT t2.id
FROM temp2 t2
WHERE NOT EXISTS (SELECT 1 FROM temp1 t1 WHERE t2.id = t1.id)
ON conflict DO nothing;
--execution time: 5.78s (1million rows)
来源:https://stackoverflow.com/questions/48922972/postgres-insert-on-conflict-do-update-vs-insert-or-update