Handling race conditions in PostgreSQL

狂风中的少年 提交于 2020-12-29 03:48:11

问题


I have several workers, each holding its own connection to PostgreSQL. The workers manipulate with different tables.

The workers handle parallel requests from outside the system. One of the tables being accessed is the table of users. When some information comes, I first need to ensure there is a record for the user in the table. If there is no record, I wish to create one at first.

I'm using the following idiom:

if [user does not exist] then [create user]

The code of [user does not exist] is:

SELECT id FROM myschema.users WHERE userId='xyz'

and I test whether any row is returned.

The (simplified) code of [create user] is:

INSERT INTO myschema.users VALUES ('xyz')

When my system handles parallel streams of different information concerning the same user, I often get PostgreSQL error:

Key (id)=(xyz) already exists

It happens because the SELECT command returns no rows, then another worker creates the user, any my worker attempts to do the same, resulting in exemplary concurrency error.

According to PostgreSQL documentation, by default, whenever I implicitly start a transaction, the table becomes locked for as long as I don't commit it. I'm not using autocommit and I only commit the transaction in blocks, e.g. after the whole if-else block.

Indeed, I could put the if-else stuff into SQL directly, but it does not solve my problem of locking in general. I was supposing that "the winner takes it all" paradigm will work, and that the first worker which manages to execute the SELECT command will own the locks until it calls COMMIT.

I've read many different topics here at SO, but I'm still not sure what the right solution is. Should I use explicit locking of tables, because the implicit locking does not work? How can I ensure that only single worker owns a table at time?


回答1:


You have to care about the transaction isolation level. It should be set to "SERIALIZABLE".

The reason are Phantom Reads - The transaction doesn't lock the whole table, but only the rows which have already been read by the transaction.

So, if another transaction inserts new data, they haven't been locked yet, and the error appears.

Serializable avoids this, by blocking all other transactions, until this one finished.

You can do this via

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

The documentations: http://www.postgresql.org/docs/9.1/static/transaction-iso.html

If you want to learn more about this topic, I can highly recommend you this video: http://www.youtube.com/watch?v=zz-Xbqp0g0A




回答2:


Actually, after some messing with ISOLATION LEVEL SERIALIZABLE as proposed by @maja, I've discovered much simpler mechanism:

PERFORM pg_advisory_lock(id);
...
# do something that others must wait for
...
PERFORM pg_advisory_unlock(id);

where id is a BIGINT value which I may choose arbitrarily according to my application's logic.

This gave me both the power and the flexibility I was looking for.



来源:https://stackoverflow.com/questions/21464406/handling-race-conditions-in-postgresql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!