EdgeNgramField min and max letters in django haystack

£可爱£侵袭症+ 提交于 2019-12-23 10:53:11

问题


Is there a way to restrict the size of the edge ngrams in django haystack indexing? For example, I create the ngram as follows:

#search_indexes.py
content_auto = indexes.EdgeNgramField(model_attr='name')

But I don't want to create 2 letter ngrams, I actually want to set the min at 4 or 5.

As background, I am using django-haystack/elasticsearch, with bonsai on heroku.


回答1:


What you need to do is override the search mapping in Haystack's ElasticSearch backend.

In brief: extend the ElasticSearch backend and either replace directly or by a settings.py import a new schema mapping.

from django.conf import settings
from haystack.backends.elasticsearch_backend import (ElasticsearchSearchBackend,
    ElasticsearchSearchEngine)

class MyElasticBackend(ElasticsearchSearchBackend):

    def __init__(self, connection_alias, **connection_options):
        super(ConfigurableElasticBackend, self).__init__(
                                connection_alias, **connection_options)
        MY_SETTINGS = {
            'settings': {
                "analysis": {
                    "analyzer": {
                        "ngram_analyzer": {
                            "type": "custom",
                            "tokenizer": "lowercase",
                            "filter": ["haystack_ngram"]
                        },
                        "edgengram_analyzer": {
                            "type": "custom",
                            "tokenizer": "lowercase",
                            "filter": ["haystack_edgengram"]
                        }
                    },
                    "tokenizer": {
                        "haystack_ngram_tokenizer": {
                            "type": "nGram",
                            "min_gram": 3,
                            "max_gram": 15,
                        },
                        "haystack_edgengram_tokenizer": {
                            "type": "edgeNGram",
                            "min_gram": 2,
                            "max_gram": 15,
                            "side": "front"
                        }
                    },
                    "filter": {
                        "haystack_ngram": {
                            "type": "nGram",
                            "min_gram": 3,
                            "max_gram": 15
                        },
                        "haystack_edgengram": {
                            "type": "edgeNGram",
                            "min_gram": 5,
                            "max_gram": 15
                        }
                    }
                }
            }
        }
        setattr(self, 'DEFAULT_SETTINGS', MY_SETTINGS)


class ConfigurableElasticSearchEngine(ElasticsearchSearchEngine):
    backend = MyElasticBackend

For a fuller explanation see my write up about extending the ElasticSearch backend to customize the search mapping.




回答2:


Its actually quite simple. Just create a folder called search_configuration inside your template directory. Then, create a file solr.xml and paste the contents of the solr.xml file here into that file. Finally, edit the EdgeNGramFilterFactory to set the minGramSize as appropriate.



来源:https://stackoverflow.com/questions/18908131/edgengramfield-min-and-max-letters-in-django-haystack

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!