Fastest check if row exists in PostgreSQL

前端 未结 8 1194
有刺的猬
有刺的猬 2020-11-28 19:20

I have a bunch of rows that I need to insert into table, but these inserts are always done in batches. So I want to check if a single row from the batch exists in the table

8条回答
  •  有刺的猬
    2020-11-28 19:53

    I would like to propose another thought to specifically address your sentence: "So I want to check if a single row from the batch exists in the table because then I know they all were inserted."

    You are making things efficient by inserting in "batches" but then doing existence checks one record at a time? This seems counter intuitive to me. So when you say "inserts are always done in batches" I take it you mean you are inserting multiple records with one insert statement. You need to realize that Postgres is ACID compliant. If you are inserting multiple records (a batch of data) with one insert statement, there is no need to check if some were inserted or not. The statement either passes or it will fail. All records will be inserted or none.

    On the other hand, if your C# code is simply doing a "set" separate insert statements, for example, in a loop, and in your mind, this is a "batch" .. then you should not in fact describe it as "inserts are always done in batches". The fact that you expect that part of what you call a "batch", may actually not be inserted, and hence feel the need for a check, strongly suggests this is the case, in which case you have a more fundamental problem. You need change your paradigm to actually insert multiple records with one insert, and forego checking if the individual records made it.

    Consider this example:

    CREATE TABLE temp_test (
        id SERIAL PRIMARY KEY,
        sometext TEXT,
        userid INT,
        somethingtomakeitfail INT unique
    )
    -- insert a batch of 3 rows
    ;;
    INSERT INTO temp_test (sometext, userid, somethingtomakeitfail) VALUES
    ('foo', 1, 1),
    ('bar', 2, 2),
    ('baz', 3, 3)
    ;;
    -- inspect the data of what we inserted
    SELECT * FROM temp_test
    ;;
    -- this entire statement will fail .. no need to check which one made it
    INSERT INTO temp_test (sometext, userid, somethingtomakeitfail) VALUES
    ('foo', 2, 4),
    ('bar', 2, 5),
    ('baz', 3, 3)  -- <<--(deliberately simulate a failure)
    ;;
    -- check it ... everything is the same from the last successful insert ..
    -- no need to check which records from the 2nd insert may have made it in
    SELECT * FROM temp_test
    

    This is in fact the paradigm for any ACID compliant DB .. not just Postgresql. In other words you are better off if you fix your "batch" concept and avoid having to do any row by row checks in the first place.

提交回复
热议问题