I have a Django model that can only be accessed using get_or_create(session=session)
, where session is a foreign key to another Django model.
Since I am
Threading is one problem, but get_or_create
is broken for any serious usage in default isolation level of MySQL:
Actualy it's not thread-safe, you can look at the code of the get_or_create method of the QuerySet object, basicaly what it does is the following :
try:
return self.get(**lookup), False
except self.model.DoesNotExist:
params = dict([(k, v) for k, v in kwargs.items() if '__' not in k])
params.update(defaults)
obj = self.model(**params)
sid = transaction.savepoint(using=self.db)
obj.save(force_insert=True, using=self.db)
transaction.savepoint_commit(sid, using=self.db)
return obj, True
So two threads might figure-out that the instance does not exists in the DB and start creating a new one, before saving them consecutively.
NO, get_or_create is not atomic.
It first asks the DB if a satisfying row exists; database returns, python checks results; if it doesn't exist, it creates it. In between the get
and the create
anything can happen - and a row corresponding to the get
criteria be created by some other code.
For instance wrt to your specific issue if two pages are open by the user (or several ajax requests are performed) at the same time this might cause all get
to fail, and for all of them to create
a new row - with the same session.
It is thus important to only use get_or_create
when the duplication issue will be caught by the database through some unique
/unique_together
, so that even though multiple threads can get to the point of save(), only one will succeed, and the others will raise an IntegrityError that you can catch and deal with.
If you use get_or_create
with (a set of) fields that are not unique in the database you will create duplicates in your database, which is rarely what you want.
More in general: do not rely on your application to enforce uniqueness and avoid duplicates in your database! THat's the database job! (well unless you wrap your critical functions with some OS-valid locks, but I would still suggest to use the database).
With thes warnings, used correctly get_or_create
is an easy to read, easy to write construct that perfectly complements the database integrity checks.
Refs and citations:
I was having this problem with a view that calls get_or_create
.
I was using Gunicorn with multiple workers, so to test it I changed the number of workers to 1 and this made the problem disappeared.
The simplest solution I found was to lock the table for access. I used this decorator to do the lock per view (for PostgreSQL):
http://www.caktusgroup.com/blog/2009/05/26/explicit-table-locking-with-postgresql-and-django/
EDIT: I wrapped the lock statement in that decorator in a try/except to deal with DB engines with no support for it (SQLite while unit testing in my case):
try:
cursor.execute('LOCK TABLE %s IN %s MODE' % (model._meta.db_table, lock))
except DatabaseError:
pass