To NOLOCK or NOT to NOLOCK, that is the question

后端 未结 5 579
时光说笑
时光说笑 2021-01-31 11:57

This is really more of a discussion than a specific question about nolock.

I took over an app recently that almost every query (and there are lots of them) has the noloc

相关标签:
5条回答
  • 2021-01-31 12:03

    If somebody says that without NOLOCK their application always gets deadlocked, then there is (more than likely) a problem with their queries. A deadlock means that two transactions cannot proceed because of resource contention and the problem cannot be resolved. An example:

    Consider Transactions A and B. Both are in-flight. Transaction A has inserted a row into table X and Transaction B has inserted a row into table Y, so Transaction A has an exclusive lock on X and Transaction B has an exclusive lock on Y.

    Now, Transaction A needs run a SELECT against table Y and Transaction B needs to run a SELECT against table X.

    The two transactions are deadlocked: A needs resource Y and B needs resource X. Since neither transaction can proceed until the other completes, the situtation cannot be resolved: neither transactions demand for a resource may be satisified until the other transaction releases its lock on the resource in contention (either by ROLLBACK or COMMIT, doesn't matter.)

    SQL Server identifies this situation and select one transaction or the other as the deadlock victim, aborts that transaction and rolls back, leaving the other transaction free to proceed to its presumable completion.

    Deadlocks are rare in real life (IMHO). One rectifies them by

    • ensuring that transaction scope is as small as possible, something SQL server does automatically (SQL Server's default transaction scope is a single statement with an implicit COMMIT), and
    • ensuring that transactions access resources in the same sequence. In the example above, if transactions A and B both locked resources X and Y in the same sequence, there would not be a deadlock.

    Timeouts

    A timeout, on the other hand, occurs when a transaction exceeds its wait time and is rolled back due to resource contention. For instance, Transaction A needs resource X. Resource X is locked by Transaction B, so Transaction A waits for the lock to be released. If the lock isn't released within the queries timeout limimt, the waiting transaction is aborted and rolled back. Every query has a query timeout associated with it (the default value is 30s, I believe), after which time the transaction is aborted and rolled back. The query timeout can be set to 0s, in which case SQL Server will let the query wait forever.

    This is probably what they are talking about. In my experience, timeouts like this usually occur in big databases when large batch jobs are updating thousands and thousands of records in a single transaction, although they can happen because a transaction goes to long (connect to your production database in Query Abalyzer, execute BEGIN TRANSACTION, update a single row in a frequently hit table in Query Analyzer and go to lunch without executing ROLLBACK or COMMIT TRANSACTION and see how long it takes for the production DBAs to go apes**t on you. Don't ask me how I know this)

    This sort of timeout is usually what results in splattering perfectly innocent SQL with all sorts of NOLOCK hints

    [TIP: if your going to do that, just execute SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED as the first statement in your stored procedure and have done with it.]

    The problem with this approach (NOLOCK/READ UNCOMMITTED) is that you can read uncommitted data from other transaction: stuff that is incomplete or that may get rolled back later, so your data integrity is comprimised. You might be sending out a bill based on data with a high level of bogosity.

    My general rule is that one should avoid the use of table hints insofar as possible. Let SQL Server and its query optimizer do their jobs.

    The right way to avoid this sort of issue is to avoid the sort of transactions (insert a million rows all at one fell swoop, for instance) that cause the problems. The locking strategy implicit in relational database SQL is designed around small transactions of short scope. Lock should be small in scope and short in duration. Think "bank teller updating somebody's checking account with a deposit." as the underlying use case. Design your processes to work in that model and you'll be much happier all the way 'round.

    Instead of inserting a million rows in one mondo insert statement, do the work in independent chunks and commit each chunk independently. If your million row insert dies after processing 999,000 rows, all the work done is lost (not to mention that the rollback can be a b*tch, and the table is still locked during rollback as well.) If you insert the million rows in block of 1000 rows each, committing after each block, you avoid the lock contention that causes deadlocks, as locks will be obtained and released and things will keep moving. If something goes south in the 999th block of 1000 rows, and the transaction get aborted and rolled back, you've still gotten 998,000 rows inserted; you've only lost 1000 rows of work. Restart/Retry is much easier.

    Also, lock escalation occurs in large transactions. For effiency, locks escalate to larger and larger scope as the number of locks held by transaction increases. If a single transaction inserts/updates/deletes a single row in a table, I get a row lock. Keep doing that and once the number of row locks held by that transaction against that table hits a threshold value, SQL Server will escalate the locking strategy: the row locks will be consolidated and converted into a smaller number page locks, thus increasing the scope of the locks held. From that point forward, an insert/delete/update of a single row will lock that page in the table. Once the number of page locks held hits its threshold value, the page locks are again consolidated and the locking strategy escalates to table locks: the transaction now locks the entire table and nobody else may play until the transaction commits or rolls back.

    Whether you can avoid functionally avoid the use of NOLOCK/READ UNCOMMITTED is entirely dependent on the nature of the processes hitting the underlying database (and the culture of the organization owning it).

    Myself, I try to avoid its use as much as possible.

    Hope this helps.

    0 讨论(0)
  • 2021-01-31 12:03

    In a traditional normalized OLTP environment, NOLOCK is a code smell and almost certainly unnecessary in a properly designed system.

    In a dimensional model, I used NOLOCK extensively to avoid locking very large fact and dimension tables which were being populated with later fact data (and dimensions may have been expiring). In the dimensional model, the facts either never change or never change after a certain point. Similarly, any dimension which is referenced will also be static, so for example, the NOLOCK will stop your long analysis operation on yesterday's data from blocking a dimension expiration during a data load for today's data.

    0 讨论(0)
  • 2021-01-31 12:22

    You should only use nolock on an unchanging table. Of course, this will be the same then as Read Committed Snapshot. Without the snapshot, you are only saving the time it takes to apply a shared lock, and then to remove it, which for most cases isn't necessary.

    As for a changing table... No lock doesn't just mean getting a row before a transaction is done updating all of its rows. You can get ghost data as data pages split, or even index pages split. Or no data. That alone scared me away, but I think there may be even more scenarios where you simply get the wrong data.

    Of course, nolock for getting rough estimates or to just check in on a process might be reasonable.

    Basic rule of thumb -- if you care about the data at all, and the data is changing, then do not use NoLOCK.

    0 讨论(0)
  • 2021-01-31 12:23

    SQL Server has added snapshot isolation in SQL Server 2005, this will enable you to still read the latest correct value without having to wait for locks. StackOverflow is also using Snapshot Isolation. The Snapshot Isolation level is more or less the same that Oracle uses, this is why deadlocks are not very common on an Oracle box. Just be aware to have plenty of tempdb space if you do enable it

    from Books On Line

    When the READ_COMMITTED_SNAPSHOT database option is set ON, read committed isolation uses row versioning to provide statement-level read consistency. Read operations require only SCH-S table level locks and no page or row locks. When the READ_COMMITTED_SNAPSHOT database option is set OFF, which is the default setting, read committed isolation behaves as it did in earlier versions of SQL Server. Both implementations meet the ANSI definition of read committed isolation.

    0 讨论(0)
  • 2021-01-31 12:25

    No, there is no need to use NOLOCK. Links: SO 1

    As for load, we deal with 2000 rows per second which is small change compared to 35k TPS

    Deadlocks are caused by lock contention and usually caused by inconsistent write order on tables in transactions. ORMs especially are rubbish at this. We get them very infrequently. A well written DAL should retry too as per MSDN.

    0 讨论(0)
提交回复
热议问题