“Filter then Aggregation” or just “Filter Aggregation”?

问题

I am working on ES recently and I found that I could achieve the almost same result but I have no clear idea as to the DIFFERENCE between these two.

"Filter then Aggregation"

POST kibana_sample_data_flights/_search
{
  "size": 0,
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "DestCountry": "CA"
        }
      }
    }
  },
  "aggs": {
    "ca_weathers": {
      "terms": { "field": "DestWeather" }
    }
  }
}

"Filter Aggregation"

POST kibana_sample_data_flights/_search
{
  "size": 0,
  "aggs": {
    "ca": {
      "filter": {
        "term": {
          "DestCountry": "CA"
        }
      },
      "aggs": {
        "_weathers": {
           "terms": { "field": "DestWeather" } 
        }
      }
    }
  }
}

My Questions

Why there are two similar functions? I believe I am wrong about it but what's the difference then? _{(please do ignore the result format, it's not the question I am asking ;p)}
Which is better if I want to filter out the unrelated/unmatched and start the aggregation on lots of documents?

回答1:

When you use it in "query", you're creating a context on ALL the docs in your index. In this case, it acts like a normal filter like: SELECT * FROM index WHERE (my_filter_condition1 AND my_filter_condition2 OR my_filter_condition3...).

When you use it in "aggs", you're creating a context on ALL the docs that might have (or haven't) been previously filtered. Let's say that if you have an structure like:

#OPTION A
{
    "aggs":{
        t_shirts" : {
            "filter" : { "term": { "type": "t-shirt" } }
        }
    }
}

Without a "query", is exactly the same as having

#OPTION B
{
    "query":{
        "filter" : { "term": { "type": "t-shirt" } }
    }
}

BUT the results will be returned in different fields.

In the Option A, the results will be returned in the aggregations field.

In the Option B, the results will be returned in the hits field.

I would recommend to apply your filters always on the query part, so you can work with subsecuent aggregations of the already filtered docs. Also because Aggrgegations cost more performance than queries.

Hope this is helpful! :D

回答2:

Both filters, used in isolation, are equivalent. If you load no results (hits), then there is no difference. But you can combine listing and aggregations. You can query or filter your docs for listing, and calculate aggregations on bucket further limited by the aggs filter. Like this:

POST kibana_sample_data_flights/_search
{
  "size": 100,
  "query": {
    "bool": {
      "filter": {
        "term": {
          ... some other filter
        }
      }
    }
  },
  "aggs": {
    "ca_filter": {
      "term": {
         "TestCountry": "CA"
      }
    },
    "aggs": {
      "ca_weathers": {
        "terms": { "field": "DestWeather" }
      }
    }
  }
}

But more likely you will need the other way, ie. make aggregations on all docs, to display summary informations, while you display docs from specific query. In this case you need to combine aggragations with post_filter.

回答3:

Answer from @Val's comment, I may just quote here for reference:

In option A, the aggregation will be run on ALL documents. In option B, the documents are first filtered and the aggregation will be run only on the selected documents. Say you have 10M documents and the filter select only a 100, it's pretty evident that option B will always be faster.

来源：https://stackoverflow.com/questions/57667127/filter-then-aggregation-or-just-filter-aggregation

标签

ElasticSearch

elasticsearch-aggregation