ElasticSearch - issue with sub term aggregation with array fields

问题

I have the two following documents:

{  
"title":"The Avengers",
"year":2012,
"casting":[  
    {  
    "name":"Robert Downey Jr.",
    "category":"Actor",
    },
    {  
    "name":"Chris Evans",
    "category":"Actor",
    }
]
}

and:

{  
"title":"The Judge",
"year":2014,
"casting":[  
    {  
    "name":"Robert Downey Jr.",
    "category":"Producer",
    },
    {  
    "name":"Robert Duvall",
    "category":"Actor",
    }
]
}

I would like to perform aggregations, based on two fields : casting.name and casting.category.

I tried with a TermsAggregation based on casting.name field, with a subaggregation, which is another TermsAggregation based on the casting.category field.

The problem is that for the "Chris Evans" entry, ElasticSearch set buckets for ALL categories (Actor, Producer) whereas it should set only 1 bucket (Actor).

It seems that there is a cartesian product between all casting.category occurences and all casting.name occurences. It behaves like this with array fields (casting), whereas I don't have the problem with simple fields (as title, or year).

I also tried to use nested aggregations, but maybe not properly, and ElasticSearch throws an error telling that casting.category is not a nested field.

Any idea here?

回答1:

Elasticsearch will flatten the nested objects, so internally you will get:

{  
"title":"The Judge",
"year":2014,
"casting.name": ["Robert Downey Jr.","Robert Duvall"],
"casting.category": ["Producer", "Actor"]
}

if you want to keep the relationship you'll need to use either nested objects or a parent child relationship

To do a nested mapping you'd need to do something like this:

  "mappings": {
    "movies": {
      "properties": {
        "title" : { "type": "string" },
        "year" : { "type": "integer" },
        "casting": {
          "type": "nested", 
          "properties": {
            "name":    { "type": "string" },
            "category": { "type": "string" }
          }
        }
      }
    }
  }

来源：https://stackoverflow.com/questions/27776428/elasticsearch-issue-with-sub-term-aggregation-with-array-fields

标签

ElasticSearch

aggregation

term