How can I get top terms for a subset of documents in a Lucene index?

前端 未结 2 1805
鱼传尺愫
鱼传尺愫 2021-01-31 12:45

I know its possible to get the top terms within a Lucene Index, but is there a way to get the top terms based on a subset of a Lucene index?

I.e. What are the top terms

2条回答
  •  傲寒
    傲寒 (楼主)
    2021-01-31 13:34

    Counting up the TermVectors will work, but will be slow if there are a lot of documents to iterate. Also note if you mean docFreq by top terms, then don't use the count in the TermFreqVector just count the terms as binary.

    Alternatively, you could iterate the terms like facet counts. Use a cached filter for every term; their BitSets can be used for a fast intersection count.

提交回复
热议问题