Extract tf-idf vectors with lucene

后端 未结 1 1157
遥遥无期
遥遥无期 2021-01-31 00:01

I have indexed a set of documents using lucene. I also have stored DocumentTermVector for each document content. I wrote a program and got the term frequency vector for each doc

相关标签:
1条回答
  • 2021-01-31 00:39

    You'll probably not found a tf-idf vector. But as you've already done, you can calculate IDF by hand. It is probably better to use the DefaultSimilarity (or whatever Similarity implementation you are using) to calculate it for you.

    Regarding Term ID, I think currently you can't. At least not until Lucene 4.0, see this.

    0 讨论(0)
提交回复
热议问题