I am trying to calculate the total number of times a particular term occurs throughout an entire index (term collection frequency). I have attempted to do so through the use of
I believe you need to turn term_statistics to true as per elasticsearch documentation:
Term statistics Setting term_statistics to true (default is false) will return
total term frequency (how often a term occurs in all documents)
document frequency (the number of documents containing the current term)
By default these values are not returned since term statistics can have a serious performance impact.
Have you tried simply using COUNT API? https://www.elastic.co/guide/en/elasticsearch/reference/7.6/search-count.html
It can return the number of matches for a query. So something like this may work.
GET /my_index/_count
{
"query" : {"match": {"my_field": "my_keyword"}
}
The reason for the difference in the count is because term vectors are not accurate unless the index in question has a single shard. For indexes with multiple shards, the documents are distributed all over the shards, hence the frequency returned isn't the total but from a randomly selected shard.
Thus, the returned frequency is just a relative measure and not the absolute value you expect. see the Behaviour section. To test this, you can create a single shard index and request the frequency (it should give you the actual total).