Elastic Search Query for Distinct Nested Values

问题

I am using the High Level REST Client for Elastic Search 6.2.2. Suppose that I have two documents in index "DOCUMENTS" with type "DOCUMENTS" that are

{
   "_id": 1,
   "Name": "John",
   "FunFacts": {
       "FavColor": "Green",
       "Age": 32
   }
},
{
   "_id": 2,
   "Name": "Amy",
   "FunFacts": {
       "FavFood": "Pizza",
       "Age": 33
   }
}

I want to find out all of the distinct fun facts and their distinct values, ultimately returning an end result of

{
    "FavColor": ["Green"],
    "Age": [32, 33],
    "FavFood": ["Pizza"]
}

It is ok for this to require more than one query to Elastic Search, but I prefer to have only one query. Furthermore, the Elastic Search index may grow to be rather large so I must force as much execution as possible to occur on the ES instance.

This code seems to produce a list of documents containing only FunFacts but I must still perform the aggregation myself, which is very very not desirable.

SearchRequest searchRequest = new SearchRequest("DOCUMENTS");
searchRequest.types("DOCUMENTS");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchAllQuery());
String [] includes = new String[1];
includes[0] = "FunFacts";
String [] excludes = new String[1];
excludes[0] = "Name";
searchSourceBuilder.fetchSource(includes, excludes);
searchRequest.source(searchSourceBuilder);

SearchResponse searchResponse =
    restHighLevelClient.search(searchRequest);

Can anyone point me in the right direction? I notice that nearly all of the Elastic Search documentation comes in the form of curl commands, which is not helpful for me as I am not well versed enough to translate such commands to JAVA.

Here is your plot twist. Since users are allowed to decide what will be their fun facts, we cannot know ahead of time what will be the keys inside of the FunFacts Map. :/

Thanks, Matt

回答1:

You can do it all in one query by using aggregations. Assuming you have the following documents in your index

{
   "Name": "Jake",
   "FunFacts": {
       "FavFood": "Burgers",
       "Age": 32
   }
}

{
   "Name": "Amy",
   "FunFacts": {
       "FavFood": "Pizza",
       "Age": 33
   }
}

{
   "Name": "Alex",
   "FunFacts": {
       "FavFood": "Burgers",
       "Age": 28
   }
}

, and you want to get the distinct "FavFood" choices, you could do so by using the following terms aggregation (docs on this topic)

{
  "aggs": {
    "disticnt_fun_facts": {
      "terms": { "field": "FunFacts.FavFood" }
    }
  }
}

, which would result in something along these lines

{
  ...
  "hits": { ... },
  "aggregations": {
    "disticnt_fun_facts": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "burgers",
          "doc_count": 2
        },
        {
          "key": "pizza",
          "doc_count": 1
        }
      ]
    }
  }
}

For brevity purposes I just left the aggregations part on the resulting response, so the important thing to notice is the buckets array, which represent each of the distinct terms found, key, and they number of occurrences within your documents, doc_count.

Hope that helps.

来源：https://stackoverflow.com/questions/49888164/elastic-search-query-for-distinct-nested-values

标签

ElasticSearch

solr

elasticsearch-5

spring-data-elasticsearch

elasticsearch-6