问题
I'm trying to run an aggregation on a field & ignore specific values! So I've got a field path that holds a heap of different url paths.
{
"size": 0,
"aggs": {
"paths": {
"terms":{
"field": "path" // Count the no unique path ~> values
}
}
},
"filter": {
"bool": {
"must_not": [
{
"regexp": {
// path MUST NOT CONTAIN media | cache
"path": {
"value": "(\/media\b|\bcache\b)"
}
}
}
]
}
}
}
When running this, it doesn't filter out the documents which have a path that contains cache or media?!
If I remove the filter, the same results would be returned if I left it in.
回答1:
You could try excluding those values inside the terms aggregation like this
{
"size": 0,
"aggs": {
"path": {
"terms": {
"field": "path",
"exclude": ".*(media|cache).*"
}
}
}
}
Caution: From the documentation
Note: The performance of a regexp query heavily depends on the regular expression chosen. Matching everything like .* is very slow as well as using lookaround regular expressions. If possible, you should try to use a long prefix before your regular expression starts
Another approach would be to get rid of those documents in query stage so you could move your filter to query and then aggregate on remaining results.
EDIT : With date filter
You could add date filter to query so that you would get only past day's results, something like this would work.
{
"query": {
"range": {
"name_of_date_field": {
"gte": "now-1d"
}
}
},
"size": 0,
"aggs": {
"path": {
"terms": {
"field": "path",
"exclude": ".*(media|cache).*"
}
}
}
}
来源:https://stackoverflow.com/questions/39737104/elasticsearch-run-aggregation-on-field-filter-out-specific-values-using-a-reg