问题
I have an Elastic index (say file
) where I append a document every time the file is downloaded by a client.
Each document is quite basic, it contains a field filename
and a date when
to indicate the time of the download.
What I want to achieve is to get, for each file the number of times it has been downloaded in the last 3 months.
For the moment, the closest I get it with this query:
{
"query": {
"range": {
"when": {
"gte": "now-3M"
}
}
},
"aggs": {
"downloads": {
"terms": {
"field": "filename.keyword"
}
}
}
}
The result is something like that:
{
"took": 793,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": 1.0,
"hits": [
{
"_index": "file",
"_type": "_doc",
"_id": "8DkTFHQB3kG435svAA3O",
"_score": 1.0,
"_source": {
"filename": "taz",
"id": 24009,
"when": "2020-08-21T08:11:54.943Z"
}
},
...
]
},
"aggregations": {
"downloads": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 418486,
"buckets": [
{
"key": "file1",
"doc_count": 313873
},
{
"key": "file2",
"doc_count": 281504
},
...,
{
"key": "file10",
"doc_count": 10662
}
]
}
}
}
So I am quite interested in the aggregations.downloads.bucket
, but this is limited to 10 results.
What do I need to change in my query to have all the list (in my case, I will have ~15,000 different files)?
Thanks.
回答1:
The size
of the terms
buckets defaults to 10. If you want to increase it, go with
{
"query": {
"range": {
"when": {
"gte": "now-3M"
}
}
},
"aggs": {
"downloads": {
"terms": {
"field": "filename.keyword",
"size": 15000 <-------
}
}
}
}
Note that there are strategies to paginate those buckets using a composite aggregation.
Also note that as your index grows, you may hit the default limit as well. It's a dynamic cluster-wide setting so it can be changed.
来源:https://stackoverflow.com/questions/63939748/how-to-get-the-number-of-documents-for-each-occurence-in-elastic