How to efficiently delete rows while NOT using Truncate Table in a 500,000+ rows table

后端 未结 8 1393
死守一世寂寞
死守一世寂寞 2020-11-30 01:30

Let\'s say we have table Sales with 30 columns and 500,000 rows. I would like to delete 400,000 in the table (those where \"toDelete=\'1\'\").

相关标签:
8条回答
  • 2020-11-30 01:41

    Calling DELETE FROM TableName will do the entire delete in one large transaction. This is expensive.

    Here is another option which will delete rows in batches :

    deleteMore:
    DELETE TOP(10000) Sales WHERE toDelete='1'
    IF @@ROWCOUNT != 0
        goto deleteMore
    
    0 讨论(0)
  • 2020-11-30 01:41

    One way I have had to do this in the past is to have a stored procedure or script that deletes n records. Repeat until done.

    DELETE TOP 1000 FROM Sales WHERE toDelete='1'
    
    0 讨论(0)
  • 2020-11-30 01:45

    You should try to give it a ROWLOCK hint so it will not lock the entire table. However, if you delete a lot of rows lock escalation will occur.

    Also, make sure you have a non-clustered filtered index (only for 1 values) on the toDelete column. If possible make it a bit column, not varchar (or what it is now).

    DELETE FROM Sales WITH(ROWLOCK) WHERE toDelete='1'
    

    Ultimately, you can try to iterate over the table and delete in chunks.

    Updated

    Since while loops and chunk deletes are the new pink here, I'll throw in my version too (combined with my previous answer):

    SET ROWCOUNT 100
    DELETE FROM Sales WITH(ROWLOCK) WHERE toDelete='1'
    
    WHILE @@rowcount > 0
    BEGIN
      SET ROWCOUNT 100
      DELETE FROM Sales WITH(ROWLOCK) WHERE toDelete='1'  
    END
    
    0 讨论(0)
  • 2020-11-30 01:50

    I'll leave my answer here, since I was able to test different approaches for mass delete and update (I had to update and then delete 125+mio rows, server has 16GB of RAM, Xeon E5-2680 @2.7GHz, SQL Server 2012).

    TL;DR: always update/delete by primary key, never by any other condition. If you can't use PK directly, create a temp table and fill it with PK values and update/delete your table using that table. Use indexes for this.

    I started with solution from above (by @Kevin Aenmey), but this approach turned out to be inappropriate, since my database was live and it handles a couple of hundred transactions per second and there was some blocking involved (there was an index for all there fields from condition, using WITH(ROWLOCK) didn't change anything).

    So, I added a WAITFOR statement, which allowed database to process other transactions.

    deleteMore:
    WAITFOR DELAY '00:00:01'
    DELETE TOP(1000) FROM MyTable WHERE Column1 = @Criteria1 AND Column2 = @Criteria2 AND Column3 = @Criteria3
    IF @@ROWCOUNT != 0
        goto deleteMore
    

    This approach was able to process ~1.6mio rows/hour for updating and ~0,2mio rows/hour for deleting.

    Turning to temp tables changed things quite a lot.

    deleteMore:
    SELECT TOP 10000 Id /* Id is the PK */
      INTO #Temp 
      FROM MyTable WHERE Column1 = @Criteria1 AND Column2 = @Criteria2 AND Column3 = @Criteria3 
    
    DELETE MT
      FROM MyTable MT
      JOIN #Temp T ON T.Id = MT.Id 
    
    /* you can use IN operator, it doesn't change anything
     DELETE FROM MyTable WHERE Id IN (SELECT Id FROM #Temp)
    
     */
    IF @@ROWCOUNT > 0 BEGIN
        DROP TABLE #Temp
        WAITFOR DELAY '00:00:01'
        goto deleteMore
    END ELSE BEGIN
        DROP TABLE #Temp
        PRINT 'This is the end, my friend'
    END
    

    This solution processed ~25mio rows/hour for updating (15x faster) and ~2.2mio rows/hour for deleting (11x faster).

    0 讨论(0)
  • 2020-11-30 01:51

    I have used the below to delete around 50 million records -

    BEGIN TRANSACTION     
         DeleteOperation:
         DELETE TOP (BatchSize)
         FROM  [database_name].[database_schema].[database_table] 
    
         IF @@ROWCOUNT > 0
         GOTO DeleteOperation
    COMMIT TRANSACTION
    

    Please note that keeping the BatchSize < 5000 is less expensive on resources.

    0 讨论(0)
  • 2020-11-30 02:01

    As I assume the best way to delete huge amount of records is to delete it by Primary Key. (What is Primary Key see here)

    So you have to generate tsql script that contains the whole list of lines to delete and after this execute this script.

    For example code below is gonna generate that file

    GO
    SET NOCOUNT ON
    
    SELECT   'DELETE FROM  DATA_ACTION WHERE ID = ' + CAST(ID AS VARCHAR(50)) + ';' + CHAR(13) + CHAR(10) + 'GO'
    FROM    DATA_ACTION
    WHERE  YEAR(AtTime) = 2014
    

    The ouput file is gonna have records like

    DELETE FROM  DATA_ACTION WHERE ID = 123;
    GO
    DELETE FROM  DATA_ACTION WHERE ID = 124;
    GO
    DELETE FROM  DATA_ACTION WHERE ID = 125;
    GO
    

    And now you have to use SQLCMD utility in order to execute this script.

    sqlcmd -S [Instance Name] -E -d [Database] -i [Script]
    

    You can find this approach explaned here https://www.mssqltips.com/sqlservertip/3566/deleting-historical-data-from-a-large-highly-concurrent-sql-server-database-table/

    0 讨论(0)
提交回复
热议问题