Filter elasticsearch results to contain only unique documents based on one field value

前端 未结 2 621
旧时难觅i
旧时难觅i 2020-12-05 15:27

All my documents have a uid field with an ID that links the document to a user. There are multiple documents with the same uid.

I want to p

相关标签:
2条回答
  • 2020-12-05 15:35

    You need a top_hits aggregation.

    And for your specific case:

    {
      "query": {
        "multi_match": {
          ...
        }
      },
      "aggs": {
        "top-uids": {
          "terms": {
            "field": "uid"
          },
          "aggs": {
            "top_uids_hits": {
              "top_hits": {
                "sort": [
                  {
                    "_score": {
                      "order": "desc"
                    }
                  }
                ],
                "size": 1
              }
            }
          }
        }
      }
    }
    

    The query above does perform your multi_match query and aggregates the results based on uid. For each uid bucket it returns only one result, but after all the documents in the bucket were sorted based on _score in descendant order.

    0 讨论(0)
  • 2020-12-05 15:47

    In ElasticSearch 5.3 they added support for field collapsing. You should be able to do something like:

    GET /_search
    {
      "query": {
        "multi_match" : {
          "query":    "this is a test", 
          "fields": [ "subject", "message", "uid" ] 
        }
      },
      "collapse" : {
        "field" : "uid" 
      },
      "size": 20,
      "from": 100
    }
    

    The benefit of using field collapsing instead of a top hits aggregation is that you can use pagination with field collapsing.

    0 讨论(0)
提交回复
热议问题