Getting total term frequency throughout entire index (Elasticsearch)

前端 未结 3 600
星月不相逢
星月不相逢 2021-02-05 13:25

I am trying to calculate the total number of times a particular term occurs throughout an entire index (term collection frequency). I have attempted to do so through the use of

相关标签:
3条回答
  • 2021-02-05 14:02

    I believe you need to turn term_statistics to true as per elasticsearch documentation:

    Term statistics Setting term_statistics to true (default is false) will return

    total term frequency (how often a term occurs in all documents)

    document frequency (the number of documents containing the current term)

    By default these values are not returned since term statistics can have a serious performance impact.

    0 讨论(0)
  • 2021-02-05 14:06

    Have you tried simply using COUNT API? https://www.elastic.co/guide/en/elasticsearch/reference/7.6/search-count.html

    It can return the number of matches for a query. So something like this may work.

    GET /my_index/_count
    {
        "query" : {"match": {"my_field": "my_keyword"}
    }
    
    0 讨论(0)
  • 2021-02-05 14:15

    The reason for the difference in the count is because term vectors are not accurate unless the index in question has a single shard. For indexes with multiple shards, the documents are distributed all over the shards, hence the frequency returned isn't the total but from a randomly selected shard.

    Thus, the returned frequency is just a relative measure and not the absolute value you expect. see the Behaviour section. To test this, you can create a single shard index and request the frequency (it should give you the actual total).

    0 讨论(0)
提交回复
热议问题