问题
This question is similar to my other question enter link description here which Val answered.
I have an index containing 3 documents.
{
"firstname": "Anne",
"lastname": "Borg",
}
{
"firstname": "Leanne",
"lastname": "Ray"
},
{
"firstname": "Anne",
"middlename": "M",
"lastname": "Stone"
}
When I search for "Ann", I would like elastic to return all 3 of these documents (because they all match the term "Ann" to a degree). BUT, I would like Leanne Ray to have a lower score (relevance ranking) because the search term "Ann" appears at a later position in this document than the term appears in the other two documents.
Here are my index settings...
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"token_chars": [
"letter",
"digit",
"custom"
],
"custom_token_chars": "'-",
"min_gram": "1",
"type": "ngram",
"max_gram": "2"
}
}
}
},
"mappings": {
"properties": {
"firstname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"copy_to": [
"full_name"
]
},
"lastname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"copy_to": [
"full_name"
]
},
"middlename": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"copy_to": [
"full_name"
]
},
"full_name": {
"type": "text",
"analyzer": "my_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
The following query brings back the expected documents, but attributes a higher score to Leanne Ray than to Anne Borg.
{
"query": {
"bool": {
"must": {
"query_string": {
"query": "Ann",
"fields": ["full_name"]
}
},
"should": {
"match": {
"full_name": "Ann"}
}
}
}
}
Here are the results...
"hits": [
{
"_index": "contacts_4",
"_type": "_doc",
"_id": "2",
"_score": 6.6333585,
"_source": {
"firstname": "Anne",
"middlename": "M",
"lastname": "Stone"
}
},
{
"_index": "contacts_4",
"_type": "_doc",
"_id": "1",
"_score": 6.142234,
"_source": {
"firstname": "Leanne",
"lastname": "Ray"
}
},
{
"_index": "contacts_4",
"_type": "_doc",
"_id": "3",
"_score": 6.079495,
"_source": {
"firstname": "Anne",
"lastname": "Borg"
}
}
Using an ngram token filter and an ngram tokenizer together seems to fix this problem...
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": [
"ngram"
],
"tokenizer": "ngram"
}
}
}
},
"mappings": {
"properties": {
"firstname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"copy_to": [
"full_name"
]
},
"lastname": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"copy_to": [
"full_name"
]
},
"middlename": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"copy_to": [
"full_name"
]
},
"full_name": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "my_analyzer"
}
}
}
}
The same query brings back the expected results with the desired relative scoring. Why does this work? Note that above, I am using an ngram tokenizer with a lowercase filter and the only difference here is that I am using an ngram filter instead of the lowercase filter.
Here are the results. Notice that Leanne Ray scored lower than both Anne Borg and Anne M Stone, as desired.
"hits": [
{
"_index": "contacts_4",
"_type": "_doc",
"_id": "3",
"_score": 4.953257,
"_source": {
"firstname": "Anne",
"lastname": "Borg"
}
},
{
"_index": "contacts_4",
"_type": "_doc",
"_id": "2",
"_score": 4.87168,
"_source": {
"firstname": "Anne",
"middlename": "M",
"lastname": "Stone"
}
},
{
"_index": "contacts_4",
"_type": "_doc",
"_id": "1",
"_score": 1.0364896,
"_source": {
"firstname": "Leanne",
"lastname": "Ray"
}
}
By the way, this query also brings back a whole lot of false positive results when the index contains other documents as well. It's not such a problem becasuethese false positives have very low scores relative to the scores of the desirable hits. But still not ideal. For example, if I add {firstname: Gideon, lastname: Grossma} to the document, the above query will bring back that document in the result set as well - albeit with a much lower score than the documents containing the string "Ann"
回答1:
The answer is the same as in the linked thread. Since you're ngraming all the indexed data, it works the same way with Ann
as with Anne
, You'll get the exact same response (see below), with different scores, though:
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "5Jr-DHIBhYuDqANwSeiw",
"_score" : 4.8442974,
"_source" : {
"firstname" : "Anne",
"lastname" : "Borg"
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "5pr-DHIBhYuDqANwSeiw",
"_score" : 4.828779,
"_source" : {
"firstname" : "Anne",
"middlename" : "M",
"lastname" : "Stone"
}
},
{
"_index" : "test",
"_type" : "_doc",
"_id" : "5Zr-DHIBhYuDqANwSeiw",
"_score" : 0.12874341,
"_source" : {
"firstname" : "Leanne",
"lastname" : "Ray"
}
}
]
UPDATE
Here is a modified query that you can use to check for parts (i.e. ann
vs anne
). Again, the casing makes no difference here, since the analyzer lowercases everything before indexing.
{
"query": {
"bool": {
"must": {
"query_string": {
"query": "ann",
"fields": [
"full_name"
]
}
},
"should": [
{
"match_phrase_prefix": {
"firstname": {
"query": "ann",
"boost": "10"
}
}
},
{
"match_phrase_prefix": {
"lastname": {
"query": "ann",
"boost": "10"
}
}
}
]
}
}
}
来源:https://stackoverflow.com/questions/61768534/assign-a-higher-score-to-matches-containing-the-search-query-at-an-earlier-posit