Most Efficient (Fast) T-SQL DELETE For Many Rows?

前端 未结 8 1179
后悔当初
后悔当初 2020-12-16 06:31

Our server application receives information about rows to add to the database at a rate of 1000-2000 rows per second, all day long. There are two mutually-exclusive columns

8条回答
  •  醉梦人生
    2020-12-16 07:25

    An OR (or an in) almost works as if each OR operand is a different query. That is, it turns into a table scan, and for each row, the database has to test each OR operand as a predicate, until it finds a match or runs out of operands.

    The only reason to package this up is to make it one logical unit of work. You could also wrap a bunch of deletes in a transaction, and only commit when all finish successfully.

    Quassnoi makes an interesting suggestion -- to use a table --, but since he then uses INs and ORs, it comes out the same.

    But try this.

    Create a new table that mirrors your real table. Call it u_real_table. Index it on tag and longTag.

    Put all your incoming data into u_real_table.

    Now, when you're ready to do your bulk thing, instead join the mirror table o the real table on tag. From the real table, delete all the tag'd rows in the u_real_table:

    delete real_table from real_table a 
       join u_real_table b on (a.tag = b.tag);
    insert into real_table select * 
       from u_real_table where tag is not null;
    

    See what we did here? Since we're joining only on tag, there's a greater chance the tag index can be used.

    First we deleted everything new, then we inserted the new replacements. We could also do an update here. Which is faster depends on your table structure and its indices.

    We didn't have to write a script to do it, we just had to have inserted the records in u_real_table.

    Now we do the same thing for longTags:

    delete real_table from real_table a 
       join u_real_table b on (a.longTag = b.longTag);
    insert into real_table select * 
       from u_real_table where longTag is not null;
    

    Finally, we clear out u_real_table:

    delete from u_real_table;
    

    Obviously, we wrap the whole each delete/insert pair in a transaction, so that the delete only becomes real when the subsequent insert succeeds, and then we wrap the whole thing in another transaction. Because it is a logical unit of work.

    This method reduces your manual work, reduces the possibility of a manual error, and has some chance of speeding up the deletes.

    Note that this relies on missing tags and longTags correctly being null, not zero or the empty string.

提交回复
热议问题