Is get_or_create() thread safe

后端 未结 4 1793
一向
一向 2020-12-24 11:46

I have a Django model that can only be accessed using get_or_create(session=session), where session is a foreign key to another Django model.

Since I am

相关标签:
4条回答
  • 2020-12-24 12:26

    Threading is one problem, but get_or_create is broken for any serious usage in default isolation level of MySQL:

    • How do I deal with this race condition in django?
    • Why doesn't this loop display an updated object count every five seconds?
    • https://code.djangoproject.com/ticket/13906
    • http://www.no-ack.org/2010/07/mysql-transactions-and-django.html
    0 讨论(0)
  • 2020-12-24 12:40

    Actualy it's not thread-safe, you can look at the code of the get_or_create method of the QuerySet object, basicaly what it does is the following :

    try:
        return self.get(**lookup), False
    except self.model.DoesNotExist:
        params = dict([(k, v) for k, v in kwargs.items() if '__' not in k])
        params.update(defaults)
        obj = self.model(**params)
        sid = transaction.savepoint(using=self.db)
        obj.save(force_insert=True, using=self.db)
        transaction.savepoint_commit(sid, using=self.db)
        return obj, True
    

    So two threads might figure-out that the instance does not exists in the DB and start creating a new one, before saving them consecutively.

    0 讨论(0)
  • 2020-12-24 12:42

    NO, get_or_create is not atomic.

    It first asks the DB if a satisfying row exists; database returns, python checks results; if it doesn't exist, it creates it. In between the get and the create anything can happen - and a row corresponding to the get criteria be created by some other code.

    For instance wrt to your specific issue if two pages are open by the user (or several ajax requests are performed) at the same time this might cause all get to fail, and for all of them to create a new row - with the same session.

    It is thus important to only use get_or_create when the duplication issue will be caught by the database through some unique/unique_together, so that even though multiple threads can get to the point of save(), only one will succeed, and the others will raise an IntegrityError that you can catch and deal with.

    If you use get_or_create with (a set of) fields that are not unique in the database you will create duplicates in your database, which is rarely what you want.

    More in general: do not rely on your application to enforce uniqueness and avoid duplicates in your database! THat's the database job! (well unless you wrap your critical functions with some OS-valid locks, but I would still suggest to use the database).

    With thes warnings, used correctly get_or_create is an easy to read, easy to write construct that perfectly complements the database integrity checks.

    Refs and citations:

    • http://groups.google.com/group/django-developers/browse_thread/thread/905f79e350525c95/0af3a41de4f4ce06
    • http://groups.google.com/group/django-developers/browse_thread/thread/f0b3381b2620e7db/8eae2f6087e550bb
    0 讨论(0)
  • 2020-12-24 12:42

    I was having this problem with a view that calls get_or_create.

    I was using Gunicorn with multiple workers, so to test it I changed the number of workers to 1 and this made the problem disappeared.

    The simplest solution I found was to lock the table for access. I used this decorator to do the lock per view (for PostgreSQL):

    http://www.caktusgroup.com/blog/2009/05/26/explicit-table-locking-with-postgresql-and-django/

    EDIT: I wrapped the lock statement in that decorator in a try/except to deal with DB engines with no support for it (SQLite while unit testing in my case):

    try:
        cursor.execute('LOCK TABLE %s IN %s MODE' % (model._meta.db_table, lock))
    except DatabaseError: 
        pass
    
    0 讨论(0)
提交回复
热议问题