Clasterized scoring in ElasticSearch

问题

Let's say I got some complex query in ElasticSearch 6.2 and it can return the next hits:

"hits" : [
  {
    ...
    "_score" : 100,
    "_source" : { ... }
    ...
  },
  {
    ...
    "_score" : 99,
    "_source" : { ... }
    ...
  },
  {
    ...
    "_score" : 50,
    "_source" : { ... }
    ...
  },
  {
    ...
    "_score" : 49,
    "_source" : { ... }
    ...
  }
]

Or the same query can return:

"hits" : [
  {
    ...
    "_score" : 10,
    "_source" : { ... }
    ...
  },
  {
    ...
    "_score" : 9.9,
    "_source" : { ... }
    ...
  },
  {
    ...
    "_score" : 2,
    "_source" : { ... }
    ...
  },
  {
    ...
    "_score" : 1,
    "_source" : { ... }
    ...
  }
]

As you see the distribution of score is uneven and there are group of items with close scores. I need to include to result set on items from top group. I can't provide the reasonable min_score, because for different query parameters the absolute score values can differ very much. Is there any way to make Elastic return the top scored group regardless of actual absolute values? Thank you in advance.

回答1:

As far as I know Elasticsearch does not provide a way to cut off some hits based on the relative score. In order to do it you should know in advance the maximum score which can be very different depending on the search query itself and on the current state of the index. One not very elegant way to achieve this is to get a maximum score from the first request that limits size of the results by one and then use relative min_score in the second request to filter out the results. On the other hand the same can be achieved by filtering results of the regular query manually on the client side.

来源：https://stackoverflow.com/questions/52179259/clasterized-scoring-in-elasticsearch

标签

ElasticSearch

relevance