问题
I am using elasticsearch along with haystack in order to provide search. I want user to search in language other than english. E.g. currently trying with Greek.
How can I ignore the accents while searching for anything. E.g. let's say if I enter Ανδρέας ( with accents), its returning results matched with it.
But when I enter Ανδρεας, its not returning any results. The search engine should bring any results that have "Ανδρέας" but also "Ανδρεας" as well (the second one is not accented).
Can someone point out how to resolve the issue?
Please let me know if I need post settings for elastic search, search_indexex, etc.
EDIT:
Here's my index settings:
ELASTICSEARCH_INDEX_SETTINGS = {
'settings': {
"analysis": {
"analyzer": {
"myanalyzer_search": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"greek_lowercase_filter",
"my_stop_filter",
"greek_stem_filter",
"english_stem_filter",
"my_edge_ngram_filter",
"asciifolding"
]
},
"myanalyzer_index": {
"type": "custom",
"tokenizer": "edgeNGram",
"filter": [
"greek_lowercase_filter",
"my_stop_filter",
"greek_stem_filter",
"english_stem_filter",
"my_edge_ngram_filter",
"asciifolding"
]
},
},
"tokenizer": {
"my_edge_ngram_tokenizer": {
"type": "edgeNGram",
"min_gram": "2",
"max_gram": "18",
"token_chars": ["letter"]
}
},
"filter": {
"my_edge_ngram_filter": {
"type": "edgeNGram",
"min_gram": 3,
"max_gram": 18
},
"greek_stem_filter": {
"type": "stemmer",
"name": "greek"
},
"greek_lowercase_filter": {
"type": "lowercase",
"language": "greek"
},
"english_stem_filter": {
"type": "stemmer",
"name": "english"
},
"my_stop_filter": {
"type": "stop",
"stopwords": ["_greek_", "_english_"]
}
}
}
}
}
This is present into search_index.py
:
class ProfileIndex(indexes.SearchIndex, indexes.Indexable):
text = indexes.EdgeNgramField(document=True, use_template=True)
title = indexes.CharField(model_attr='title')
sorted_title = indexes.CharField(model_attr='title', indexed=False, stored=True)
employment_history = indexes.EdgeNgramField(model_attr='employment_history', null=True)
def get_model(self):
return SellerProfile
def index_queryset(self, using=None):
return self.get_model().objects.all()
.........
And here's the template:
{{ object.user.get_full_name }}
{{ object.title }}
{{ object.bio }}
{{ object.employment_history }}
{{ object.education }}
I am doing query like following:
results = SearchQuerySet().model(Profile).autocomplete(text='Ανδρεας')
and
results = SearchQuerySet().model(Profile).autocomplete(text='Ανδρέας')
Thanks.
回答1:
You need to add asciifolding
token filter to you analysis/query pipeline http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-asciifolding-tokenfilter.html
That basically strips any accents from your words so you can easily find them later with/without searching with accents.
来源:https://stackoverflow.com/questions/23593770/ignore-accents-in-elastic-search-with-haystack