Wrong score in elastic search result

拜拜、爱过 提交于 2021-01-27 18:34:34

问题


Not getting the correct score for the elastic search query result.

ES Query -

{
  "from": 0,
  "size": 10,
  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "(emergency) OR (emergency*) OR (*emergency) OR (*emergency*)",
            "fields": [
              "MDMGlobalData.Name1"
            ]
          }
        }
      ]
    }
  }
}

ES result -

{
  "took": 29,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 798,
      "relation": "eq"
    },
    "max_score": 9.169065,
    "hits": [
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551037160",
        "_score": 9.169065,
        "_source": {
          "MDMGlobalData": {
            "Name1": "PARAGON EMERGENCY"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551040507",
        "_score": 9.169065,
        "_source": {
          "MDMGlobalData": {
            "Name1": "EMERGENCY MD"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551076447",
        "_score": 9.169065,
        "_source": {
          "MDMGlobalData": {
            "Name1": "COASTAL EMERGENCY"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551100746",
        "_score": 9.169065,
        "_source": {
          "MDMGlobalData": {
            "Name1": "EMERGENCY MD"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551090880",
        "_score": 9.169065,
        "_source": {
          "MDMGlobalData": {
            "Name1": "PAFFORD EMERGENCY"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551106787",
        "_score": 9.169065,
        "_source": {
          "MDMGlobalData": {
            "Name1": "CAPROCK EMERGENCY"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551021568",
        "_score": 9.121077,
        "_source": {
          "MDMGlobalData": {
            "Name1": "WILTON EMERGENCY"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551124137",
        "_score": 9.121077,
        "_source": {
          "MDMGlobalData": {
            "Name1": "EMERGENCY ONE"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551125549",
        "_score": 9.121077,
        "_source": {
          "MDMGlobalData": {
            "Name1": "EMERGENCY ONE"
          }
        }
      },
      {
        "_index": "customermasterdata",
        "_type": "_doc",
        "_id": "MDMCM551133066",
        "_score": 9.121077,
        "_source": {
          "MDMGlobalData": {
            "Name1": "EMERGENCY MD"
          }
        }
      }
    ]
  }
}

Ideally, The first set in the result should be the Name1 which has value just "emergency" or start with the word "emergency"

And how could we have the same score for almost first 5 result sets? Being the Name1 value is different.

Due to wrong scoring, the results are messed up. How to correct the score in the result?


回答1:


No, That need not be the case. Because ES follows Lucene scoring function

Reason for the same score:

  1. You have only two terms in each document - emergency and one more word
  2. Emergency word matches as it is. Field Length is same
  3. Number of occurrence is one. i.e Term frequencies are same.
  4. Relevancy is same for all the terms. idf
  5. Coord is same as your doc contains only one occurrence of Emergency

But if you have a document with Emergency X Y Z, then score of this will be lower than the other documents which you have. Because term frequency is higher for this one.

And if you have only Emergency, score of this document will be higher than all.

It is perfectly normal to have same score in your scenario as user doesn't know which emergency he/she meant.

Update:

{
    "query":{
        "bool":{
            "must":{
                "term":{
                "MDMGlobalData.Name1":"emergency"
                }
            }
        }
    }
}

With the sample data, output:

"hits": [
      {
        "_index": "emerge",
        "_type": "_doc",
        "_id": "iN1hKnMBojxRtp6HNI7d",
        "_score": 0.10938574,
        "_source": {
          "MDMGlobalData": {
            "Name1": "EMERGENCY"
          }
        }
      },
      {
        "_index": "emerge",
        "_type": "_doc",
        "_id": "g91TKnMBojxRtp6Hto4q",
        "_score": 0.08701137,
        "_source": {
          "MDMGlobalData": {
            "Name1": "PARAGON EMERGENCY"
          }
        }
      },
      {
        "_index": "emerge",
        "_type": "_doc",
        "_id": "hN1TKnMBojxRtp6H2I6A",
        "_score": 0.08701137,
        "_source": {
          "MDMGlobalData": {
            "Name1": "EMERGENCY MD"
          }
        }
      },
      {
        "_index": "emerge",
        "_type": "_doc",
        "_id": "hd1TKnMBojxRtp6H_I6_",
        "_score": 0.08701137,
        "_source": {
          "MDMGlobalData": {
            "Name1": "COASTAL EMERGENCY"
          }
        }
      },
      {
        "_index": "emerge",
        "_type": "_doc",
        "_id": "h91VKnMBojxRtp6HYI4e",
        "_score": 0.07223585,
        "_source": {
          "MDMGlobalData": {
            "Name1": "EMERGENCY MD X"
          }
        }
      }
    ]


来源:https://stackoverflow.com/questions/62778671/wrong-score-in-elastic-search-result

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!