问题
I am using haystack in our django application for search and search is working very fine. But I am having an issue with reamtime search. For realtime search I am using haystack's default RealTimeSignalProcessor(haystack.signals.RealtimeSignalProcessor). My model contains one many to many field in it. When data is changed for this many to many field only, it seems the realtimesignal processor is not updating indexing data properly. After updating the many to many data, I am getting wrong search result.
Its working after manually running rebuild_index command. I think rebuild_index is working because its doing cleaning first and then again building indexing data.
Can someone suggest some solution to the problem ?
By the way following is code around it.
Model:
class Message_forum(models.Model):
message = models.ForeignKey(Message)
tags = models.ManyToManyField(Tag, blank=True, null=True) #this is many to many field
search_index.py:
class Message_forumIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.EdgeNgramField(document=True, use_template=True)
message = indexes.CharField(model_attr='message', null=True)
tags = indexes.CharField(model_attr='tags', null=True)
def get_model(self):
return Message_forum
def index_queryset(self, using=None):
return self.get_model().objects.all()
def prepare_tags(self, obj):
return [tag.tag for tag in obj.tags.all()]
index template:
{{ object.tags.tag }}
settings.py:
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'
I am having latest version of haystack and whoosh as back-end.
回答1:
I have figured it out after digging into code of haystack.
In haystack default RealTimeSignalProcessor, its connecting post_save and post_delete signals of each model of application. Now in handle_save method is being called in post_save and post_delete signal. In this method haystack is validating the sender and in my case for tags(many-to-many) field, Message_forum_tag model is being passed as sender. Now index for this model is not present into my search_index since its not my application model but instead django's generated one. And so in handle_save method it was bypassing any changes on this model and hence it wasn't updating indexed data for changed object.
So I have figured out two different solution for this problem.
I can create custom realtime signal processor specific to my model Message_forum, in this in setup method I can connect m2mchanged signal on each many-to-many fields in Message_forum with handle_save. At the same time I can pass Message_forum as sender so that haystack will pass the validation(not exactly validation but its trying to get its index obj) around it and will update the index data of changed object.
The other way is to ensure that whenever any many-to-many field is being changed, save method of its parent(here Message_forum.save()) is being called. And so it will always invoke post_save signal and after that haystack will update the index object data.
Have spend around 3 hours to figure it out. Hope this will help someone having same problem.
回答2:
I had a similar issue, but I went with a hybrid of Nikhil's number 1 and 2 options.
For a model called ContentItem with a m2m field called categories, I created a custom signal processor that extended the base one.
So I implemented a setup() duplicated from the source, but added the following line:
models.signals.m2m_changed.connect(self.handle_save, sender=ContentItem.categories.through)
And did the same with teardown() but with a similar disconnect line. I also extended handle_save and changed the line:
index = self.connections[using].get_unified_index().get_index(sender)
to
index = self.connections[using].get_unified_index().get_index(instance.__class__)
This means that this signal processor is watching for m2m changes in the management table for ContentItem to Category, but when a m2m change is made will pass the name of the correct class i.e. ContentItem instead of ContentItem.categories.through.
This seems to work for the most part, but if I delete a Category the m2m_changed doesn't fire despite the relationship being removed. It looks like this might be a bug in django itself.
So I also added the following line to setup (and a disconnect to teardown):
models.signals.pre_delete.connect(self.handle_m2m_delete, sender=Category)
And created a method duplicate of handle_save (handle_m2m_delete) which manually removed the relationship from the through table and saved the effected ContentItems (causing the original handle_save to then be triggered). This meant at least that I didn't have to remember to save the parent to update the index anywhere else in the code.
回答3:
I can suggest an alternative solution, simpler than the complication of trying to watch all the right signals and ending up with a signal processor that has to know about all your m2m relationships.
It looks like this:
signals.py:
from collections import OrderedDict
from haystack.signals import RealtimeSignalProcessor
class BatchingSignalProcessor(RealtimeSignalProcessor):
"""
RealtimeSignalProcessor connects to Django model signals
we store them locally for processing later - must call
``flush_changes`` from somewhere else (eg middleware)
"""
# Haystack instantiates this as a singleton
_change_list = OrderedDict()
def _add_change(self, method, sender, instance):
key = (sender, instance.pk)
if key in self._change_list:
del self._change_list[key]
self._change_list[key] = (method, instance)
def handle_save(self, sender, instance, created, raw, **kwargs):
method = super(BatchingSignalProcessor, self).handle_save
self._add_change(method, sender, instance)
def handle_delete(self, sender, instance, **kwargs):
method = super(BatchingSignalProcessor, self).handle_delete
self._add_change(method, sender, instance)
def flush_changes(self):
while True:
try:
(sender, pk), (method, instance) = self._change_list.popitem(last=False)
except KeyError:
break
else:
method(sender, instance)
middleware.py:
from haystack import signal_processor
class HaystackBatchFlushMiddleware(object):
"""
for use with our BatchingSignalProcessor
this should be placed *at the top* of MIDDLEWARE_CLASSES
(so that it runs last)
"""
def process_response(self, request, response):
try:
signal_processor.flush_changes()
except AttributeError:
# (in case we're not using our expected signal_processor)
pass
return response
settings.py:
MIDDLEWARE_CLASSES = (
'myproject.middleware.HaystackBatchFlushMiddleware',
...
)
HAYSTACK_SIGNAL_PROCESSOR = 'myproject.signals.BatchingSignalProcessor'
I'm trying this out in my project, seems to work fine. I welcome any feedback or suggestions.
来源:https://stackoverflow.com/questions/19706083/django-haystack-indexing-is-not-working-for-many-to-many-field-in-model