How to get the number of documents for each occurence in Elastic?

问题

I have an Elastic index (say file) where I append a document every time the file is downloaded by a client. Each document is quite basic, it contains a field filename and a date when to indicate the time of the download.

What I want to achieve is to get, for each file the number of times it has been downloaded in the last 3 months.

For the moment, the closest I get it with this query:

{
    "query": {
        "range": {
            "when": {
                "gte": "now-3M"
            }
        }
    },
    "aggs": {
        "downloads": {
            "terms": {
                "field": "filename.keyword"
            }
        }
    }
}

The result is something like that:

{
    "took": 793,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 10000,
            "relation": "gte"
        },
        "max_score": 1.0,
        "hits": [
            {
                "_index": "file",
                "_type": "_doc",
                "_id": "8DkTFHQB3kG435svAA3O",
                "_score": 1.0,
                "_source": {
                    "filename": "taz",
                    "id": 24009,
                    "when": "2020-08-21T08:11:54.943Z"
                }
            },
            ...
        ]
    },
    "aggregations": {
        "downloads": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 418486,
            "buckets": [
                {
                    "key": "file1",
                    "doc_count": 313873
                },
                {
                    "key": "file2",
                    "doc_count": 281504
                },
                ...,
                {
                    "key": "file10",
                    "doc_count": 10662
                }
            ]
        }
    }
}

So I am quite interested in the aggregations.downloads.bucket, but this is limited to 10 results.

What do I need to change in my query to have all the list (in my case, I will have ~15,000 different files)?

Thanks.

回答1:

The size of the terms buckets defaults to 10. If you want to increase it, go with

{
    "query": {
        "range": {
            "when": {
                "gte": "now-3M"
            }
        }
    },
    "aggs": {
        "downloads": {
            "terms": {
                "field": "filename.keyword",
                "size": 15000          <-------
            }
        }
    }
}

Note that there are strategies to paginate those buckets using a composite aggregation.

Also note that as your index grows, you may hit the default limit as well. It's a dynamic cluster-wide setting so it can be changed.

来源：https://stackoverflow.com/questions/63939748/how-to-get-the-number-of-documents-for-each-occurence-in-elastic

标签

ElasticSearch