发表新帖

发表新帖

How can I get top terms for a subset of documents in a Lucene index?

前端未结

关注

 2  1805

鱼传尺愫 2021-01-31 12:45

I know its possible to get the top terms within a Lucene Index, but is there a way to get the top terms based on a subset of a Lucene index?

I.e. What are the top terms

2条回答

傲寒 (楼主)

2021-01-31 13:34

Counting up the TermVectors will work, but will be slow if there are a lot of documents to iterate. Also note if you mean docFreq by top terms, then don't use the count in the TermFreqVector just count the terms as binary.

Alternatively, you could iterate the terms like facet counts. Use a cached filter for every term; their BitSets can be used for a fast intersection count.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题