Determining row changes since last access in SQL Server

问题

We have a multiuser system with users saving to a central SQL Server 2005 database. We have encountered an issue where a user refreshes in changes from the db while another user saves new data. The way we are currently collecting the changes is we have a timestamp column on each table which is filled at every row insert/update. Another user will have a timestamp stored on the client which is the last time he did a pull from the database.

Each save is done in a transaction. The example we are dealing with is as follows:

User1 starts a save, opening a transaction and inserting/modifying rows, changing their timestamps.
User2 refreshes from the database before User1 has committed the changes, somehow causing User2's timestamp to update.
User 1 commits the transaction and all changes.
User2 refreshes from the database again, however because his timestamp was updated previously, only the second half of the changes committed by User1 and pulled in causing errors and application crashes.

This is making us think that timestamps aren't necessarily the best method to use to determine database changes since the last access by the front-end system. What would a better solution be?

Further example

User1 starts a save, opening a transaction and inserting/modifying rows and updating their timestamps.
User2 starts another save, opens a transaction, inserts/modify OTHER rows updating their timestamps, and commits his transaction.
User3 refreshes from the database and pulls down all the data that User2 committed, updating his LastRefreshTimestamp to the last timestamp created in the db by User2.
User1 commits his transaction.
User 3 refreshes again from the database but is pulling all changes between the end of User2's transaction and the end of User1's transaction based on its LastRefreshTimestamp, missing out on all the changes committed by User1's transaction before User2's transaction began.

回答1:

Interesting problem, and I can't think of a simple clean T-SQL-based solution, but this is exactly the type of synchronization challenge that Change Tracking in SQL 2008 was created for... http://msdn.microsoft.com/en-us/library/bb933875.aspx

Nice very-high-level overview of change tracking vs Change Data Capture in this blog/article: http://blogs.technet.com/b/josebda/archive/2009/03/24/sql-server-2008-change-tracking-ct-and-change-data-capture-cdc.aspx

And you can potentially combine this with Microsoft Sync Framework if your general aim is to keep client-side copies of the repository: http://msdn.microsoft.com/en-us/sync/bb887608

回答2:

I think I understand your question:

Each client maintains a disconnected data set and periodically refreshes their local copy of the data set.

You use a SQL-92 timestamp (which is different from a SQL Server Timestamp; one is a datetime, one is a binary row version) to track updates so you can find the delta in the dataset and update the clients.

This approach is causing problems because your timestamps are being calculated before the transaction is fully committed and subsequent transactions calculate newer timestamps but may commit before the older transactions and your "latest timestamp" misses some or all of the records.

So what can be done?

The very fact that this is such a "difficult nut to crack" is a good indication that this is an atypical design pattern although I assume changing this model is out of the question.

Here are a couple of solutions that might work.

Build a "margin of error" into your refreshes. If your last refresh was 2011-08-06 23:14:05 subtract a few minutes (you'll have to figure out what that margin of error is) from that and get those updates. This would be your quick-fix band-aid solution. If you want to refine this solution, use a SQL Server timestamp (automatic binary row version) calculate a checksum of all the rows and store that in your local dataset and compare rows during your refresh.
Lazy refreshes. Again, use a SQL Server timestamp (rowversion) for row versioning and check for changes before a user is allowed to edit, if the timestamps don't match then do a refresh. When checking, use the NOWAIT hint to determine if the row is currently being edited. Typically you would perform this check again when saving to ensure there is not a conflict (http://www.mssqltips.com/tip.asp?tip=1501) this is a more typical approach.

Ultimately, you shouldn't need to maintain your entire dataset on the client. SQL Server is very good at handling concurrency and multiple users. If you have to do searches on the local data, it may be better to perform these searches on the server. If you're running into blocking issues, it may be better to focus on trying to shorten your transaction durations. In general it's a terrible idea for a transaction to be open waiting for user input. Your question indicates this might be taking place but I hate to assume anything.

Beyond that, the ideas/options get progressively worse/crazy (e.g. you could put SQL (Express?) instances on the clients and use replication to push changes out and use those instances for the local data store. But replication latency can be a major issue there and I am sure this will cause more problems than it solves.)

There is another possibility

I may be wrong in my interpretation of your question. You may be calculating your update timestamp on the client and then later saving this to the server. If this is the case, instead of passing an explicit datetime to the server, pass in GETDATE(). This will calculate the datetime on the fly.

If there is too much code to refactor, you could also handle this with a trigger:

Simple Example:

CREATE TRIGGER Trig_MyTable_Update ON dbo.MyTable FOR INSERT, UPDATE
AS 
IF @@ROWCOUNT=0 RETURN 
UPDATE U SET UpdateDT = GETDATE()
FROM MyTable U INNER JOIN INSERTED I ON U.Id = I.Id

回答3:

I assume a timestamp is the rowversion binary(8) type here. This increments for any old reason: if you do update-.. set col1 = col1.. for example

We found too many false positives relying on timestamps.

Let's says User5 loads data, but User4 changes then reverts (after all, they are end users...), you'll have 2 timestamp changes but the same underlying data.

A real datetime based system, even accurate to 1 second or smaller will eventually fails under high load, but you have the same problem.

We solved this by

not using timestamps (they increment with dummy updates)
using REPEATABLE_READ isolation
each save sends old values too
comparing every value of the row to be updated

So User3 load one set of data that is changed on the server. When they save, any change in values throws an error.

If you can disallow UPDATEs where no data actually changed, so that the timestamp doesn't trigger then you don't need this.

You can add an extra columns tracking a dummy update if you want so a user "save" is acknowleged even if it isn't a "real" update.. but this ichanges a rowversion values

Hence our solution above...

回答4:

Add an update of the timestamps for any row created/updated just before the commit.

Anyway user-side checks do not replace server-side checks (and constraints), so this mechanism is only for the comfort of the users, not as the last mean of data validation...

回答5:

Setting the time stamp on the client side to the date/time of the last pull, combined with time stamping during the transaction on the server side, is where your problem originates. Either do a last "update/affect time stamps of all affected records" as the last action in your transaction - though you may still run into the time stamp resolution problem, or change your pull logic to select records based on time stamp differences between client and server instead of comparing all record time stamps to a single "pull" date/time.

回答6:

Look into AFTER Triggers http://download.oracle.com/docs/cd/B19306_01/server.102/b14220/triggers.htm#CNCPT017, A database trigger is a special type of stored procedure that is invoked automatically when a predefined event occurs for example user saves new data.

USE database
GO
CREATE TRIGGER on_table_update_insert_delete
ON table AFTER update, delete, insert
AS
DECLARE @string AS char(1)

IF EXISTS(SELECT * FROM inserted)
    IF EXISTS(SELECT * FROM deleted)
    SET @string = 'U';
    ELSE
    SET @string = 'I';
    ELSE
    SET @string = 'D'
INSERT INTO event_logs(username, event_type, event_time, table_name)
SELECT SYSTEM_USER @string, getdate(), 'Table'
GO

To enable the trigger you will need to put in this code

USE database
GO
ENABLE TRIGGER on_table_update_insert_delete ON DATABASE
GO

Thanks :)

回答7:

In an interactive system, timestamps with a resoltion of a second are usually adequate. Higher resolution timestamps lessen the risk of update failure. Failure often points to non-atomic updates, or cases involving multiple transactions against the same data for the same request.

Your problem looks like a non-atomic update, or an isolation level allowing non-committed changes to be seen. In any case the application should handle having the data updated without crashing. Normally, some sort of data updated by another user error should be presented.

The most recent timestamp on the data being considered for update, can be used with an application to ensure data isn't modified after being displayed but before being updated. Batch updates may cause problems with timestamps as they can be done extremely rapidly.

Transaction counters can be used, but the counter for each record needs to be tracked.

回答8:

The timestamp type suffers in some sql implementations when potentially multiple transactions occur during the same "second". For sql implementations with high resolution timers, for example, nanoseconds, it's unlikely to be a problem.

Instead, how about just keeping a transaction counter somewhere? Each time an update is done, increment it.

来源：https://stackoverflow.com/questions/3809751/determining-row-changes-since-last-access-in-sql-server

标签

sql

sql-server

database-design

concurrency