In postgres if I run the following statement
update table set col = 1 where col = 2
In the default READ COMMITTED isolation level, from mul
Your stated guarantees apply in this simple case, but not necessarily in slightly more complex queries. See the end of the answer for examples.
Assuming that col1 is unique, has exactly one value "2", or has stable ordering so every UPDATE
matches the same rows in the same order:
What'll happen for this query is that the threads will find the row with col=2 and all try to grab a write lock on that tuple. Exactly one of them will succeed. The others will block waiting for the first thread's transaction to commit.
That first tx will write, commit, and return a rowcount of 1. The commit will release the lock.
The other tx's will again try to grab the lock. One by one they'll succeed. Each transaction will in turn go through the following process:
WHERE col=2
condition after getting the lock.UPDATE
will skip that row.UPDATE
has no other rows so it will report zero rows updated.In this simple case the row-level locking and the condition re-check effectively serializes the updates. In more complex cases, not so much.
You can easily demonstrate this. Open say four psql sessions. In the first, lock the table with BEGIN; LOCK TABLE test;
*. In the rest of the sessions run identical UPDATE
s - they'll block on the table level lock. Now release the lock by COMMIT
ting your first session. Watch them race. Only one will report a row count of 1, the others will report 0. This is easily automated and scripted for repetition and scaling up to more connections/threads.
To learn more, read rules for concurrent writing, page 11 of PostgreSQL concurrency issues - and then read the rest of that presentation.
As Kevin noted in the comments, if col
isn't unique so you might match multiple rows, then different executions of the UPDATE
could get different orderings. This can happen if they choose different plans (say one is a via a PREPARE
and EXECUTE
and another is direct, or you're messing with the enable_
GUCs) or if the plan they all use uses an unstable sort of equal values. If they get the rows in a different order then tx1 will lock one tuple, tx2 will lock another, then they'll each try to get locks on each others' already-locked tuples. PostgreSQL will abort one of them with a deadlock exception. This is yet another good reason why all your database code should always be prepared to retry transactions.
If you're careful to make sure concurrent UPDATE
s always get the same rows in the same order you can still rely on the behaviour described in the first part of the answer.
Frustratingly, PostgreSQL doesn't offer UPDATE ... ORDER BY
so ensuring that your updates always select the same rows in the same order isn't as simple as you might wish. A SELECT ... FOR UPDATE ... ORDER BY
followed by a separate UPDATE
is often safest.
If you're doing queries with multiple phases, involving multiple tuples, or conditions other than equality you can get surprising results that differ from the results of a serial execution. In particular, concurrent runs of anything like:
UPDATE test SET col = 1 WHERE col = (SELECT t.col FROM test t ORDER BY t.col LIMIT 1);
or other efforts to build a simple "queue" system will *fail* to work how you expect. See the PostgreSQL docs on concurrency and this presentation for more info.
If you want a work queue backed by a database there are well-tested solutions that handle all the surprisingly complicated corner cases. One of the most popular is PgQ. There's a useful PgCon paper on the topic, and a Google search for 'postgresql queue' is full of useful results.
* BTW, instead of a LOCK TABLE
you can use SELECT 1 FROM test WHERE col = 2 FOR UPDATE;
to obtain a write lock on just that on tuple. That'll block updates against it but not block writes to other tuples or block any reads. That allows you to simulate different kinds of concurrency issues.