Our server application receives information about rows to add to the database at a rate of 1000-2000 rows per second, all day long. There are two mutually-exclusive columns
An OR (or an in) almost works as if each OR operand is a different query. That is, it turns into a table scan, and for each row, the database has to test each OR operand as a predicate, until it finds a match or runs out of operands.
The only reason to package this up is to make it one logical unit of work. You could also wrap a bunch of deletes in a transaction, and only commit when all finish successfully.
Quassnoi makes an interesting suggestion -- to use a table --, but since he then uses INs and ORs, it comes out the same.
But try this.
Create a new table that mirrors your real table. Call it u_real_table. Index it on tag and longTag.
Put all your incoming data into u_real_table.
Now, when you're ready to do your bulk thing, instead join the mirror table o the real table on tag. From the real table, delete all the tag'd rows in the u_real_table:
delete real_table from real_table a
join u_real_table b on (a.tag = b.tag);
insert into real_table select *
from u_real_table where tag is not null;
See what we did here? Since we're joining only on tag, there's a greater chance the tag index can be used.
First we deleted everything new, then we inserted the new replacements. We could also do an update here. Which is faster depends on your table structure and its indices.
We didn't have to write a script to do it, we just had to have inserted the records in u_real_table.
Now we do the same thing for longTags:
delete real_table from real_table a
join u_real_table b on (a.longTag = b.longTag);
insert into real_table select *
from u_real_table where longTag is not null;
Finally, we clear out u_real_table:
delete from u_real_table;
Obviously, we wrap the whole each delete/insert pair in a transaction, so that the delete only becomes real when the subsequent insert succeeds, and then we wrap the whole thing in another transaction. Because it is a logical unit of work.
This method reduces your manual work, reduces the possibility of a manual error, and has some chance of speeding up the deletes.
Note that this relies on missing tags and longTags correctly being null, not zero or the empty string.
Seems that your table is not indexed on (tag)
and (longTag)
Build two indexes: one on (tag)
, one on (longTag)
If you are planning to delete a really big number of records, then declare two table variables, fill them with values and delete like this:
DECLARE @tag TABLE (id INT);
DECLARE @longTag TABLE (id VARCHAR(50));
INSERT
INTO @tag
VALUES (`tag1`)
INSERT
INTO @tag
VALUES (`tag2`)
/* ... */
INSERT INTO @longTag
VALUES ('LongTag1')
/* ... */
DELETE
FROM MyRecords r
WHERE r.tag IN (SELECT * FROM @tag)
OR r.longTag IN (SELECT * FROM @longTag)
You may also try to perform a two-pass DELETE
:
DELETE
FROM MyRecords r
WHERE r.tag IN (SELECT * FROM @tag)
DELETE
FROM MyRecords r
WHERE r.longTag IN (SELECT * FROM @longTag)
and see what statements runs longer, to see if there's an issue with the indexes.
Indexing:
Consider using an indexed persisted computed column for longTag which stores a checksum of longTag. Instead of indexing 'LongTag1', you index a 4-byte int value (86939596).
Then your look-ups are [hopefully*] faster, and you just have to include the longTag value in the query/delete. Your code would be slightly more complex, but the indexing is likely to be much more efficient.
* Requires testing
Check out this video which showcases how to do a 'nibbling' delete. The process works well and can definitely reduce the locking/collision problems you're seeing:
http://www.sqlservervideos.com/video/nibbling-deletes
Maybe:
DELETE FROM MyRecords
WHERE tag IN (1, 2, 555) -- build a list
OR longTag IN ('LongTag1')
I suspect indexes would help your deletes but drastically slow your inserts, so I wouldn't play with that too much. But then my intuition isn't exactly perfect yet, you might be able to tune FillFactor or other items to get around that issue, and the one thing I do know for sure it that you really want to profile both anyway.
Another option is to load new inserts into a temp table (named something like InputQueue
), and then join the temp table on MyRecords to handle filtering updates. This would also make it easy to do the update in two steps: you could delete Tags and longTags as separate operations and that might turn out to be much more efficient.
Using ORs may cause a table scan - can you break it up into four statements? Wrapping each in a transaction may also speed things up.
DELETE from MyRecords
WHERE tag = 1
DELETE from MyRecords
WHERE tag = 2
DELETE from MyRecords
WHERE tag = 555
DELETE from MyRecords
WHERE longTag = 'LongTag1'