Fastest check if row exists in PostgreSQL

前端 未结 8 1200
有刺的猬
有刺的猬 2020-11-28 19:20

I have a bunch of rows that I need to insert into table, but these inserts are always done in batches. So I want to check if a single row from the batch exists in the table

相关标签:
8条回答
  • 2020-11-28 19:41
    SELECT 1 FROM user_right where userid = ? LIMIT 1
    

    If your resultset contains a row then you do not have to insert. Otherwise insert your records.

    0 讨论(0)
  • 2020-11-28 19:47

    How about simply:

    select 1 from tbl where userid = 123 limit 1;
    

    where 123 is the userid of the batch that you're about to insert.

    The above query will return either an empty set or a single row, depending on whether there are records with the given userid.

    If this turns out to be too slow, you could look into creating an index on tbl.userid.

    if even a single row from batch exists in table, in that case I don't have to insert my rows because I know for sure they all were inserted.

    For this to remain true even if your program gets interrupted mid-batch, I'd recommend that you make sure you manage database transactions appropriately (i.e. that the entire batch gets inserted within a single transaction).

    0 讨论(0)
  • 2020-11-28 19:49
    select true from tablename where condition limit 1;
    

    I believe that this is the query that postgres uses for checking foreign keys.

    In your case, you could do this in one go too:

    insert into yourtable select $userid, $rightid, $count where not (select true from yourtable where userid = $userid limit 1);
    
    0 讨论(0)
  • 2020-11-28 19:50
    INSERT INTO target( userid, rightid, count )
      SELECT userid, rightid, count 
      FROM batch
      WHERE NOT EXISTS (
        SELECT * FROM target t2, batch b2
        WHERE t2.userid = b2.userid
        -- ... other keyfields ...
        )       
        ;
    

    BTW: if you want the whole batch to fail in case of a duplicate, then (given a primary key constraint)

    INSERT INTO target( userid, rightid, count )
    SELECT userid, rightid, count 
    FROM batch
        ;
    

    will do exactly what you want: either it succeeds, or it fails.

    0 讨论(0)
  • 2020-11-28 19:53

    I would like to propose another thought to specifically address your sentence: "So I want to check if a single row from the batch exists in the table because then I know they all were inserted."

    You are making things efficient by inserting in "batches" but then doing existence checks one record at a time? This seems counter intuitive to me. So when you say "inserts are always done in batches" I take it you mean you are inserting multiple records with one insert statement. You need to realize that Postgres is ACID compliant. If you are inserting multiple records (a batch of data) with one insert statement, there is no need to check if some were inserted or not. The statement either passes or it will fail. All records will be inserted or none.

    On the other hand, if your C# code is simply doing a "set" separate insert statements, for example, in a loop, and in your mind, this is a "batch" .. then you should not in fact describe it as "inserts are always done in batches". The fact that you expect that part of what you call a "batch", may actually not be inserted, and hence feel the need for a check, strongly suggests this is the case, in which case you have a more fundamental problem. You need change your paradigm to actually insert multiple records with one insert, and forego checking if the individual records made it.

    Consider this example:

    CREATE TABLE temp_test (
        id SERIAL PRIMARY KEY,
        sometext TEXT,
        userid INT,
        somethingtomakeitfail INT unique
    )
    -- insert a batch of 3 rows
    ;;
    INSERT INTO temp_test (sometext, userid, somethingtomakeitfail) VALUES
    ('foo', 1, 1),
    ('bar', 2, 2),
    ('baz', 3, 3)
    ;;
    -- inspect the data of what we inserted
    SELECT * FROM temp_test
    ;;
    -- this entire statement will fail .. no need to check which one made it
    INSERT INTO temp_test (sometext, userid, somethingtomakeitfail) VALUES
    ('foo', 2, 4),
    ('bar', 2, 5),
    ('baz', 3, 3)  -- <<--(deliberately simulate a failure)
    ;;
    -- check it ... everything is the same from the last successful insert ..
    -- no need to check which records from the 2nd insert may have made it in
    SELECT * FROM temp_test
    

    This is in fact the paradigm for any ACID compliant DB .. not just Postgresql. In other words you are better off if you fix your "batch" concept and avoid having to do any row by row checks in the first place.

    0 讨论(0)
  • 2020-11-28 19:57

    as @MikeM pointed out.

    select exists(select 1 from contact where id=12)
    

    with index on contact, it can usually reduce time cost to 1 ms.

    CREATE INDEX index_contact on contact(id);
    
    0 讨论(0)
提交回复
热议问题