How to update large table with millions of rows in SQL Server?

后端 未结 6 815
面向向阳花
面向向阳花 2020-11-28 10:07

I\'ve an UPDATE statement which can update more than million records. I want to update them in batches of 1000 or 10000. I tried with @@ROWCOUNT bu

相关标签:
6条回答
  • 2020-11-28 10:23

    This is a more efficient version of the solution from @Kramb. The existence check is redundant as the update where clause already handles this. Instead you just grab the rowcount and compare to batchsize.

    Also note @Kramb solution didn't filter out already updated rows from the next iteration hence it would be an infinite loop.

    Also uses the modern batch size syntax instead of using rowcount.

    DECLARE @batchSize INT, @rowsUpdated INT
    SET @batchSize = 1000;
    SET @rowsUpdated = @batchSize; -- Initialise for the while loop entry
    
    WHILE (@batchSize = @rowsUpdated)
    BEGIN
        UPDATE TOP (@batchSize) TableName
        SET Value = 'abc1'
        WHERE Parameter1 = 'abc' AND Parameter2 = 123 and Value <> 'abc1';
    
        SET @rowsUpdated = @@ROWCOUNT;
    END
    
    0 讨论(0)
  • 2020-11-28 10:25

    I want share my experience. A few days ago I have to update 21 million records in table with 76 million records. My colleague suggested the next variant. For example, we have the next table 'Persons':

    Id | FirstName | LastName | Email            | JobTitle
    1  | John      |  Doe     | abc1@abc.com     | Software Developer
    2  | John1     |  Doe1    | abc2@abc.com     | Software Developer
    3  | John2     |  Doe2    | abc3@abc.com     | Web Designer
    

    Task: Update persons to the new Job Title: 'Software Developer' -> 'Web Developer'.

    1. Create Temporary Table 'Persons_SoftwareDeveloper_To_WebDeveloper (Id INT Primary Key)'

    2. Select into temporary table persons which you want to update with the new Job Title:

    INSERT INTO Persons_SoftwareDeveloper_To_WebDeveloper SELECT Id FROM
    Persons WITH(NOLOCK) --avoid lock 
    WHERE JobTitle = 'Software Developer' 
    OPTION(MAXDOP 1) -- use only one core
    

    Depends on rows count, this statement will take some time to fill your temporary table, but it would avoid locks. In my situation it took about 5 minutes (21 million rows).

    3. The main idea is to generate micro sql statements to update database. So, let's print them:

    DECLARE @i INT, @pagesize INT, @totalPersons INT
        SET @i=0
        SET @pagesize=2000
        SELECT @totalPersons = MAX(Id) FROM Persons
    
        while @i<= @totalPersons
        begin
        Print '
        UPDATE persons 
          SET persons.JobTitle = ''ASP.NET Developer''
          FROM  Persons_SoftwareDeveloper_To_WebDeveloper tmp
          JOIN Persons persons ON tmp.Id = persons.Id
          where persons.Id between '+cast(@i as varchar(20)) +' and '+cast(@i+@pagesize as varchar(20)) +' 
            PRINT ''Page ' + cast((@i / @pageSize) as varchar(20))  + ' of ' + cast(@totalPersons/@pageSize as varchar(20))+'
         GO
         '
         set @i=@i+@pagesize
        end
    

    After executing this script you will receive hundreds of batches which you can execute in one tab of MS SQL Management Studio.

    4. Run printed sql statements and check for locks on table. You always can stop process and play with @pageSize to speed up or speed down updating(don't forget to change @i after you pause script).

    5. Drop Persons_SoftwareDeveloper_To_AspNetDeveloper. Remove temporary table.

    Minor Note: This migration could take a time and new rows with invalid data could be inserted during migration. So, firstly fix places where your rows adds. In my situation I fixed UI, 'Software Developer' -> 'Web Developer'.

    0 讨论(0)
  • 2020-11-28 10:29
    1. You should not be updating 10k rows in a set unless you are certain that the operation is getting Page Locks (due to multiple rows per page being part of the UPDATE operation). The issue is that Lock Escalation (from either Row or Page to Table locks) occurs at 5000 locks. So it is safest to keep it just below 5000, just in case the operation is using Row Locks.

    2. You should not be using SET ROWCOUNT to limit the number of rows that will be modified. There are two issues here:

      1. It has that been deprecated since SQL Server 2005 was released (11 years ago):

        Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in a future release of SQL Server. Avoid using SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. For a similar behavior, use the TOP syntax

      2. It can affect more than just the statement you are dealing with:

        Setting the SET ROWCOUNT option causes most Transact-SQL statements to stop processing when they have been affected by the specified number of rows. This includes triggers. The ROWCOUNT option does not affect dynamic cursors, but it does limit the rowset of keyset and insensitive cursors. This option should be used with caution.

      Instead, use the TOP () clause.

    3. There is no purpose in having an explicit transaction here. It complicates the code and you have no handling for a ROLLBACK, which isn't even needed since each statement is its own transaction (i.e. auto-commit).

    4. Assuming you find a reason to keep the explicit transaction, then you do not have a TRY / CATCH structure. Please see my answer on DBA.StackExchange for a TRY / CATCH template that handles transactions:

      Are we required to handle Transaction in C# Code as well as in Store procedure

    I suspect that the real WHERE clause is not being shown in the example code in the Question, so simply relying upon what has been shown, a better model would be:

    DECLARE @Rows INT,
            @BatchSize INT; -- keep below 5000 to be safe
    
    SET @BatchSize = 2000;
    
    SET @Rows = @BatchSize; -- initialize just to enter the loop
    
    BEGIN TRY    
      WHILE (@Rows = @BatchSize)
      BEGIN
          UPDATE TOP (@BatchSize) tab
          SET    tab.Value = 'abc1'
          FROM  TableName tab
          WHERE tab.Parameter1 = 'abc'
          AND   tab.Parameter2 = 123
          AND   tab.Value <> 'abc1' COLLATE Latin1_General_100_BIN2;
          -- Use a binary Collation (ending in _BIN2, not _BIN) to make sure
          -- that you don't skip differences that compare the same due to
          -- insensitivity of case, accent, etc, or linguistic equivalence.
    
          SET @Rows = @@ROWCOUNT;
      END;
    END TRY
    BEGIN CATCH
      RAISERROR(stuff);
      RETURN;
    END CATCH;
    

    By testing @Rows against @BatchSize, you can avoid that final UPDATE query (in most cases) because the final set is typically some number of rows less than @BatchSize, in which case we know that there are no more to process (which is what you see in the output shown in your answer). Only in those cases where the final set of rows is equal to @BatchSize will this code run a final UPDATE affecting 0 rows.

    I also added a condition to the WHERE clause to prevent rows that have already been updated from being updated again.

    0 讨论(0)
  • 2020-11-28 10:30
    WHILE EXISTS (SELECT * FROM TableName WHERE Value <> 'abc1' AND Parameter1 = 'abc' AND Parameter2 = 123)
    BEGIN
    UPDATE TOP (1000) TableName
    SET Value = 'abc1'
    WHERE Parameter1 = 'abc' AND Parameter2 = 123 AND Value <> 'abc1'
    END
    
    0 讨论(0)
  • 2020-11-28 10:32

    Your print is messing things up, because it resets @@ROWCOUNT. Whenever you use @@ROWCOUNT, my advice is to always set it immediately to a variable. So:

    DECLARE @RC int;
    WHILE @RC > 0 or @RC IS NULL
        BEGIN
            SET rowcount 5;
    
            UPDATE TableName
                SET Value  = 'abc1'
                WHERE Parameter1  = 'abc' AND Parameter2  = 123 AND Value <> 'abc1';
    
            SET @RC = @@ROWCOUNT;
            PRINT(@@ROWCOUNT)
        END;
    
    SET rowcount = 0;
    

    And, another nice feature is that you don't need to repeat the update code.

    0 讨论(0)
  • 2020-11-28 10:33

    First of all, thank you all for your inputs. I tweak my Query - 1 and got my desired result. Gordon Linoff is right, PRINT was messing up my query so I modified it as following:

    Modified Query - 1:

    SET ROWCOUNT 5
    WHILE (1 = 1)
      BEGIN
        BEGIN TRANSACTION
    
            UPDATE TableName 
            SET Value = 'abc1' 
            WHERE Parameter1 = 'abc' AND Parameter2 = 123
    
            IF @@ROWCOUNT = 0
              BEGIN
                    COMMIT TRANSACTION
                    BREAK
              END
        COMMIT TRANSACTION
      END
    SET ROWCOUNT  0
    

    Output:

    (5 row(s) affected)
    
    (5 row(s) affected)
    
    (4 row(s) affected)
    
    (0 row(s) affected)
    
    0 讨论(0)
提交回复
热议问题