SQL Batched Delete

后端 未结 9 1890
星月不相逢
星月不相逢 2021-02-20 04:29

I have a table in SQL Server 2005 which has approx 4 billion rows in it. I need to delete approximately 2 billion of these rows. If I try and do it in a single transaction, th

相关标签:
9条回答
  • 2021-02-20 04:49

    I would do something similar to the temp table suggestions but I'd select into a new permanent table the rows you want to keep, drop the original table and then rename the new one. This should have a relatively low tran log impact. Obviously remember to recreate any indexes that are required on the new table after you've renamed it.

    Just my two p'enneth.

    0 讨论(0)
  • 2021-02-20 04:51

    Well, if you were using SQL Server Partitioning, say based on the date column, you would have possibly switched out the partitions that are no longer required. A consideration for a future implementation perhaps.

    I think the best option may be as you say, to delete the data in smaller batches, rather than in one hit, so as to avoid any potential blocking issues.

    You could also consider the following method:

    1. Copy the data to keep into a temporary table
    2. Truncate the original table to purge all data
    3. Move everything from the temporary table back into the original table

    Your indexes would also be rebuilt as the data was added back to the original table.

    0 讨论(0)
  • 2021-02-20 04:56

    You can 'nibble' the delete's which also means that you don't cause a massive load on the database. If your t-log backups run every 10 mins, then you should be ok to run this once or twice over the same interval. You can schedule it as a SQL Agent job

    try something like this:

    DECLARE @count int
    SET @count = 10000
    
        DELETE  FROM table1 
        WHERE table1id IN (
            SELECT TOP (@count) tableid
            FROM table1
            WHERE x='y'
        )
    
    0 讨论(0)
  • 2021-02-20 04:57

    I agree with the people who want you loop over a smaller set of records, this will be faster than trying to do the whole operation in one step. You may to experience withthe number of records you should include inthe loop. About 2000 at a time seems to be the sweet spot in most of the tables I do large deltes from althouhg a few need smaller amounts like 500. Depends on number of forign keys, size of the record, triggers etc, so it really will take some experimenting to find what you need. It also depends on how heavy the use of the table is. A heavily accessed table will need each iteration of the loop to run a shorter amount of time. If you can run during off hours, or best yet in single user mode, then you can have more records deleted in one loop.

    If you don't think you do this in one night during off hours, it might be best to design the loop with a counter and only do a set number of iterations each night until it is done.

    Further, if you use an implicit transaction rather than an explicit one, you can kill the loop query at any time and records already deleted will stay deleted except those in the current round of the loop. Much faster than trying to rollback half a million records becasue you've brought the system to a halt.

    It is usually a good idea to backup a database immediately before undertaking an operation of this nature.

    0 讨论(0)
  • 2021-02-20 04:58

    The short answer is, you can't delete 2 billion rows without incurring some kind of major database downtime.

    Your best option may be to copy the data to a temp table and truncate the original table, but this will fill your tempDB and would use no less logging than deleting the data.

    You will need to delete as many rows as you can until the transaction log fills up, then truncate it each time. The answer provided by Stanislav Kniazev could be modified to do this by increasing the batch size and adding a call to truncate the log file.

    0 讨论(0)
  • 2021-02-20 04:59

    In addition to putting this in a batch with a statement to truncate the log, you also might want to try these tricks:

    • Add criteria that matches the first column in your clustered index in addition to your other criteria
    • Drop any indexes from the table and then put them back after the delete is done if that's possible and won't interfere with anything else going on in the DB, but KEEP the clustered index

    For the first point above, for example, if your PK is clustered then find a range which approximately matches the number of rows that you want to delete each batch and use that:

    DECLARE @max_id INT, @start_id INT, @end_id INT, @interval INT
    SELECT @start_id = MIN(id), @max_id = MAX(id) FROM My_Table
    SET @interval = 100000  -- You need to determine the right number here
    SET @end_id = @start_id + @interval
    
    WHILE (@start_id <= @max_id)
    BEGIN
         DELETE FROM My_Table WHERE id BETWEEN @start_id AND @end_id AND <your criteria>
    
         SET @start_id = @end_id + 1
         SET @end_id = @end_id + @interval
    END
    
    0 讨论(0)
提交回复
热议问题