TFIDF calculating confusion

后端 未结 2 478
佛祖请我去吃肉
佛祖请我去吃肉 2021-02-04 15:47

I found the following code on the internet for calculating TFIDF:

https://github.com/timtrueman/tf-idf/blob/master/tf-idf.py

I added \"1+\" in

2条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-02-04 16:07

    If the word in question is contained in every document in the collection your 1+ change will result in a negative value. As 0 < (x / (1 + x)) < 1 holds for all x > 0. Which results in a negative logarithm.

    In my opinion the correct IDF for a nonexistent word is infinite or undefined, but by adding 1+ to the denominator and the nominator a nonexistent word will have an IDF slightly higher than any existing word and words that exist in every document will have an IDF of zero. Both cases will probably work well with your code.

提交回复
热议问题