问题
I have the following documents:
south africa
north africa
I want to retrieve my "south africa" document from:
s africa
(a)southafrica
(b)safrica
(c)
I defined the followings filters and analyzers:
POST test_index
{
"settings": {
"analysis": {
"filter": {
"synonym_filter": {
"type": "synonym",
"synonyms": [
"south,s",
"north,n"
]
},
"shingle_filter": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 3,
"token_separator": ""
}
},
"analyzer": {
"my_shingle": {
"type": "custom",
"tokenizer": "standard",
"filter": ["shingle_filter"]
},
"my_shingle_synonym": {
"type": "custom",
"tokenizer": "standard",
"filter": ["shingle_filter", "synonym_filter"]
},
"my_synonym_shingle": {
"type": "custom",
"tokenizer": "standard",
"filter": ["synonym_filter", "shingle_filter"]
}
}
}
},
"mappings": {}
}
1) With my_shingle south africa
will be indexed as south
, southafrica
, africa
2) With my_shingle_synonym south africa
will be indexed as south
, s
, southafrica
, africa
3) With my_synonym_shingle south africa
will be indexed as south
, souths
, southsafrica
, s
, safrica
, africa
So with
(1) I will find b
(2) I will find a, b
(3) I will find a, c
I want south africa
to be indexed as: south
, s
, southafrica
, safrica
, africa
回答1:
You do not have to output all possible tokens as per your requirement. Your problem can be solved by using different analyzers on multi fields.
You would define mapping
of your desired field like this.
"mappings": {
"your_mapping": {
"properties": {
"name": {
"type": "string",
"analyzer": "my_shingle",
"fields": {
"synonym": {
"type": "string",
"analyzer": "my_synonym_shingle"
}
}
}
}
}
}
sample document to index
PUT test_index/your_mapping/1
{
"name" : "south africa"
}
then you would query on all variants of name field with wildcard expression.
GET test_index/your_mapping/_search
{
"query": {
"query_string": {
"fields": [
"name*"
],
"query": "safrica"
}
}
}
来源:https://stackoverflow.com/questions/40681178/elasticsearch-using-shingle-filter-with-synonym