{\'country\': \'France\', \'collected\': \'2018-03-12\', \'active\': true}
{\'country\': \'France\', \'collected\': \'2018-03-13\', \'active\': true}
{\'country\': \
It sounds like you are looking to filter the aggregation results. I believe you'll need to use a filter bucket for this
Something like:
{
"aggs": {
"group": {
"terms": {
"field": "country"
},
"filter": {
"term": {
"active": true
}
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 1,
"sort": [
{
"collected": {
"order": "desc"
}
}
]
}
}
}
}
}
}
Why don't you filter them beforehand:
POST /test/_search
{
"query": {
"bool": {
"filter": [
{
"term":{
"active": true
}
}
]
}
},
"aggs": {
"group": {
"terms": {
"field": "country"
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 2,
"sort": [
{
"collected":
{"order":"desc"}
}
]
}
}
}
}
}
}
This query will group your data by county where active is true.
More info: filter context
I think you can get away with sorting by two fields in your top_hits
: by active
and by collected
. Basically, you want true
s to be first and when equal, then sort by collected
. Something like the following will always show the active:true
documents sorted by collected
.
The only downside to this solution is that if you don't have any active documents, top_hits
will show one active:false
document.
{
"size": 0,
"aggs": {
"group": {
"terms": {
"field": "country"
},
"aggs": {
"group_docs": {
"top_hits": {
"size": 1,
"sort": [
{
"active": {
"order": "desc"
},
"collected": {
"order": "desc"
}
}
]
}
}
}
}
}
}
Generally, you can nest aggregations as needed to achieve any outcome. In this case, adding a filter bucket aggregation in between should achieve the desired outcome.
{
"size": 0,
"aggs": {
"group": {
"terms": { "field": "country" },
"aggs": {
"active_in_group": {
"filter" : { "term": { "active": true } },
"aggs": {
"group_docs": {
"top_hits": {
"size": 1,
"sort": [
{ "collected": { "order": "desc" } }
]
}
}
}
}
}
}
}
}
Here you have:
Agg level 1 - terms bucket; what is the count of each country in your result set (active or inactive)
Agg level 2 - filter bucket; what is the count of active items within each country bucket
Agg level 3 - top hits; what is the top result (most recently collected, according to your sort) of the active items within each country bucket
As you can see, any nested aggregation always respects the aggregations it is nested within.
One thing I'm unclear on, is if you want the count within each country bucket to reflect only the active items, or also the inactive items, or if you don't care about the counts at all and you're just using the term buckets to get the top hits within each country.
If you want the counts to reflect only the active items, then reverse the term and the filter aggregations, if you want the counts to include active and inactive, keep this order. If you don't care about the counts, the order doesn't matter.
This will of course add a level of aggregation to your results (the count of active items within each country), but that should be easy enough to overcome / ignore when parsing the results.
This solution has been verified to work in elastic 6.X, but I can see you must still be on elastic 1.x for some reason since you're using search_type=count which was deprecated in elastic 2.x. This solution should still work since these specific aggregations haven't changed for some time but I can't verify that there isn't some bug or something that has since been patched since elastic 1.x is very out of date. For future ref, elastic changes a lot from version to version. You generally want to include your version in any questions about elastic and check the version on any answers. In any event, I'd recommend an upgrade if you can.