问题
I'm writing a process that archives rows from a SQL Server table based on a datetime column. I want to move all the rows with a date before X, but the problem is that there are millions of rows for each date, so doing a BEGIN TRANSACTION...INSERT...DELETE...COMMIT for each date takes too long, and locks up the database for other users.
Is there a way that I can do it in smaller chunks? Maybe using ROWCOUNT or something like that?
I'd originally considered something like this:
SET ROWCOUNT 1000
DECLARE @RowsLeft DATETIME
DECLARE @ArchiveDate DATETIME
SET @ROWSLEFT = (SELECT TOP 1 dtcol FROM Events WHERE dtcol <= @ArchiveDate)
WHILE @ROWSLEFT IS NOT NULL
BEGIN
INSERT INTO EventsBackups
SELECT top 1000 * FROM Events
DELETE Events
SET @ROWSLEFT = (SELECT TOP 1 dtcol FROM Events WHERE dtcol <= @ArchiveDate)
END
But then I realized that I can't guarantee that the rows I'm deleting are the ones I just backed up. Or can I...?
UPDATE: Another options I'd considered was adding a step:
- SELECT TOP 1000 rows that meet my date criteria into a temp table
- Begin Transaction
- Insert from temp table into archive table
- Delete from source table, joining to temp table across every column
- Commit transaction
- Repeat 1-5 until no rows remain that meet the date criteria
Does anybody have an idea for how the expense of this series might compare to some of the other options discussed below?
DETAIL: I'm using SQL 2005, since somebody asked.
回答1:
Just INSERT the result of the DELETE:
WHILE 1=1
BEGIN
WITH EventsTop1000 AS (
SELECT TOP 1000 *
FROM Events
WHERE <yourconditionofchoice>)
DELETE EventsTop1000
OUTPUT DELETED.*
INTO EventsBackup;
IF (@@ROWCOUNT = 0)
BREAK;
END
This is atomic and consistent.
回答2:
use a INSERT with an OUTPUT INTO clause to store the IDs of the inserted rows, then DELETE joining to this temp table to remove only those IDs
DECLARE @TempTable (YourKeyValue KeyDatatype not null)
INSERT INTO EventsBackups
(columns1,column2, column3)
OUTPUT INSERTED.primaryKeyValue
INTO @TempTable
SELECT
top 1000
columns1,column2, column3
FROM Events
DELETE Events
FROM Events
INNER JOIN @TempTable t ON Events.PrimaryKey=t.YourKeyValue
回答3:
How about:
INSERT INTO EventsBackups
SELECT TOP 1000 * FROM Events ORDER BY YourKeyField
DELETE Events
WHERE YourKeyField IN (SELECT TOP 1000 YourKeyField FROM Events ORDER BY YourKeyField)
回答4:
How about don't do it all at once?
INSERT INTO EventsBackups
SELECT * FROM Events WHERE date criteria
Then later,
DELETE FROM Events
SELECT * FROM Events INNER JOIN EventsBackup on Events.ID = EventsBackup.ID
or the equivalent.
Nothing you've said so far suggests you need a transaction.
回答5:
Have you got an index on the datefield? If you haven't sql may be forced to upgrade to a table lock which will lock out all your users while your archive statements run.
I think you will need an index for this operation to perform at all well! Put an index on your date field and try your operation again!
回答6:
Could you make a copy of Events, move all rows with dates >= x to that, drop Events and rename the copy Events? Or copy, truncate and then copy back? If you can afford a little downtime this would probably be the quickest approach.
回答7:
Here's what I ended up doing:
SET @CleanseFilter = @startdate
WHILE @CleanseFilter IS NOT NULL
BEGIN
BEGIN TRANSACTION
INSERT INTO ArchiveDatabase.dbo.MyTable
SELECT *
FROM dbo.MyTable
WHERE startTime BETWEEN @startdate AND @CleanseFilter
DELETE dbo.MyTable
WHERE startTime BETWEEN @startdate AND @CleanseFilter
COMMIT TRANSACTION
SET @CleanseFilter = (SELECT MAX(starttime)
FROM (SELECT TOP 1000
starttime
FROM dbo.MyTable
WHERE startTime BETWEEN @startdate AND @enddate
ORDER BY starttime) a)
END
I'm not pulling exactly 1000, just 1000ish, so it handles repeats in the time column appropriately (something I worried about when I considered using ROWCOUNT). Since there are often repeats in the time column, I see it regularly move 1002 or 1004 rows/iteration, so I know it's getting everything.
I'm submitting this as an answer so it can be judged up against the other solutions people have provided. Let me know if there's something obviously wrong with this method. Thanks for your help, everybody, and I'll accept whichever answer has the most votes in a few days.
回答8:
Another option would be to add a trigger procedure to the Events table that does nothing but add the same record to the EventsBackup table.
That way the EventsBackup is always up to date, and all you do is periodically purge records from your Events table.
来源:https://stackoverflow.com/questions/864341/move-sql-server-data-in-limited-1000-row-chunks