Using django haystack autocomplete with elasticsearch to search for digits/numbers?

雨燕双飞 提交于 2019-12-12 09:51:42

问题


I'm using Django Haystack backed by Elasticsearch for autocomplete, and I'm having trouble searching for digits in a field.

For example, I have a field called 'name' on an object type that has some values like this:

['NAME', 'NAME2', 'NAME7', 'ANOTHER NAME 8', '7342', 'SOMETHING ELSE', 'LAST ONE 7']

and I'd like to use autocomplete to search for all objects with the number '7' in the name.

I've set up my search_index with this field:

name_auto = indexes.EdgeNgramField(model_attr='name')

and I'm using a search query like so:

SearchQuerySet().autocomplete(name_auto='7')

However, this search returns no results. I believe this is because the edge-ngram tokenizer for elasticsearch defaults to "lowercase", which throws out digits entirely.

So, I found elasticstack, which allows customizing the haystack/elasticsearch backend, but I can't seem to configure the ELASTICSEARCH_INDEX_SETTINGS correctly to get the functionality I want.

The default settings look like this:

ELASTICSEARCH_INDEX_SETTINGS = {
    'settings': {
        "analysis": {
            "analyzer": {
                "synonym_analyzer" : {
                    "type": "custom",
                    "tokenizer" : "standard",
                    "filter" : ["synonym"]
                },
                "ngram_analyzer": {
                    "type": "custom",
                    "tokenizer": "lowercase",
                    "filter": ["haystack_ngram", "synonym"]
                },
                "edgengram_analyzer": {
                    "type": "custom",
                    "tokenizer": "lowercase",
                    "filter": ["haystack_edgengram"]
                }
            },
            "tokenizer": {
                "haystack_ngram_tokenizer": {
                    "type": "nGram",
                    "min_gram": 3,
                    "max_gram": 15,
                },
                "haystack_edgengram_tokenizer": {
                    "type": "edgeNGram",
                    "min_gram": 2,
                    "max_gram": 15,
                    "side": "front"
                }
            },
            "filter": {
                "haystack_ngram": {
                    "type": "nGram",
                    "min_gram": 3,
                    "max_gram": 15
                },
                "haystack_edgengram": {
                    "type": "edgeNGram",
                    "min_gram": 2,
                    "max_gram": 15
                },
                "synonym" : {
                    "type" : "synonym",
                    "ignore_case": "true",
                    "synonyms_path" : "synonyms.txt"
                }
            }
        }
    }
}

I've tried to alter the edgengram_analyzer block in a number of ways without success, and adding something like this

"token_chars": [ "letter", "digit" ]

to the "haystack_ngram_tokenizer" has not worked either.

Can someone help me determine how to use haystack/elasticsearch/autocomplete to search for digits? Or will I have to split the 'name' field into all possible n-grams myself and then use a standard matching search? Any help would be greatly appreciated.

Thanks a lot!


回答1:


There is a solution which helps me: http://silentsokolov.github.io/2014/09/03/django-haystack-elasticsearch-prombiemy-avtodopolnieniia.html

The document is written in Russian lang, so use Google Translation.



来源:https://stackoverflow.com/questions/25827783/using-django-haystack-autocomplete-with-elasticsearch-to-search-for-digits-numbe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!