Why is Django returning stale cache data?

陌路散爱 提交于 2019-12-01 16:52:17

Is django-cache-machine really needed?

MyModel1.objects.all()[0]

Roughly translates to

SELECT * FROM app_mymodel LIMIT 1

Queries like this are always fast. There would not be a significant difference in speeds whether you fetch it from the cache or from the database.

When you use cache manager you actually add a bit of overhead here that might make things a bit slower. Most of the time this effort will be wasted because there may not be a cache hit as explained in the next section.

How django-cache-machine works

Whenever you run a query, CachingQuerySet will try to find that query in the cache. Queries are keyed by {prefix}:{sql}. If it’s there, we return the cached result set and everyone is happy. If the query isn’t in the cache, the normal codepath to run a database query is executed. As the objects in the result set are iterated over, they are added to a list that will get cached once iteration is done.

source: https://cache-machine.readthedocs.io/en/latest/

Accordingly, with the two queries executed in your question being identical, cache manager will fetch the second result set from memcache provided the cache hasn't been invalided.

The same link explains how cache keys are invalidated.

To support easy cache invalidation, we use “flush lists” to mark the cached queries an object belongs to. That way, all queries where an object was found will be invalidated when that object changes. Flush lists map an object key to a list of query keys.

When an object is saved or deleted, all query keys in its flush list will be deleted. In addition, the flush lists of its foreign key relations will be cleared. To avoid stale foreign key relations, any cached objects will be flushed when the object their foreign key points to is invalidated.

It's clear that saving or deleting an object would result in many objects in the cache having to be invalidated. So you are slowing down these operations by using cache manager. Also worth noting is that the invalidation documentation does not mention many to many fields at all. There is an open issue for this and from your comment on that issue it's clear that you have discovered it too.

Solution

Chuck cache machine. Caching all queries are almost never worth it. It leads to all kind of hard to find bugs and issues. The best approach is to optimize your tables and fine tune your queries. If you find a particular query that is too slow cache it manually.

This was my workaround solution:

    >>> m1 = MyModel1.objects.all()[0]
    >>> m1
    <MyModel1: ID: 8887972990743179; name: my-name-blahblah>

    >>> m2 = MyModel2.objects.all()[0]
    >>> m2.model1.all()
    []
    >>> m2.model1.add(m1)
    >>> m2.model1.all()
    []

    >>> MyModel1.objects.invalidate(m1)
    >>> MyModel2.objects.invalidate(m2)
    >>> m2.save()
    >>> m2.model1.all()
    [<MyModel1: ID: 8887972990743179; name: my-name-blahblah>]
A. J. Parr

Have you considered hooking into the model signals to invalidate the cache when an object is added? For your case you should look at M2M Changed Signal

Small example that doesn't solve your problem but relates the workaround you gave before to my signals solution approach (I don't know django-cache-machine):

def invalidate_m2m(sender, **kwargs):
    instance = kwargs.get('instance', None)
    action = kwargs.get('action', None)

    if action == 'post_add':
        Sender.objects.invalidate(instance)

m2m_changed.connect(invalidate_m2m, sender=MyModel2.model1.through)

A. J. Parr answer is alomost correct, but you forgot post_remove and also you can bind it to every ManytoManyfield like this :

from django.db.models.signals import m2m_changed
from django.dispatch import receiver

@receiver(m2m_changed, )
def invalidate_cache_m2m(sender, instance, action, reverse, model, pk_set, **kwargs ):
    if action in ['post_add', 'post_remove'] :
        model.objects.invalidate(instance)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!