What's the denominator for ElasticSearch scores?

*爱你&永不变心* 提交于 2019-12-13 06:20:50

问题


I have a search which has multiple criterion.

Each criterion (grouped by should) has a different weighted score.

ElasticSearch returns a list of results; each with a score - which seems an arbitrary score to me. This is because I can't find a denominator for that score.

My question is - how can I represent each score as a ratio?

Dividing each score by max_score would not work since it'll show the best match as a 100% match with the search criteria.


回答1:


The _score calculation depends on the combination of queries used. For instance, a simple query like:

{ "match": { "title": "search" }}

would use Lucene's TFIDFSimilarity, combining:

  • term frequency (TF): how many times does the term search appear in the title field of this document? The more often, the higher the score

  • inverse document frequency (IDF): how many times does the term search appear in the title field of all documents in the index? The more often, the lower the score

  • field norm: how long is the title field? The longer the field, the lower the score. (Shorter fields like title are considered to be more important than longer fields like body.)

  • A query normalization factor. (can be ignored)

On the other hand, a bool query like this:

"bool": {
    "should": [
        { "match": { "title": "foo" }},
        { "match": { "title": "bar" }},
        { "match": { "title": "baz" }}
    ]
}

would calculate the _score for each clause which matches, add them together then divide by the total number of clauses (and once again have the query normalization factor applied).

So it depends entirely on what queries you are using.

You can get a detailed explanation of how the _score was calculated by adding the explain parameter to your query:

curl localhost:9200/_search?explain -d '
{
    "query": ....
}'

My question is - how can I represent each score as a ratio?

Without understanding what you want your query to do it is impossible to answer this. Depending on your use case, you could use the function_score query to implement your own scoring algorithm.



来源:https://stackoverflow.com/questions/21346164/whats-the-denominator-for-elasticsearch-scores

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!