about cosine similarity

后端 未结 1 1793
囚心锁ツ
囚心锁ツ 2021-01-16 07:45

I am finding cosine similarity between documents.. I did it like this

D1=(8,0,0,1) where 8,0,0,1 are the tf-idf scores of the terms t1, t2, t3 , t4

D2=(7,0,

相关标签:
1条回答
  • 2021-01-16 08:39

    The denominator is wrong.

    The cosine similarity is defined as

             D1 · D2
     sim = ———————————
            |D1| |D2|
    

    Here

    D1 · D2 = (7*8 + 0*0 + 0*0 + 1*1) = 57
               ______________________    __
       |D2| = √ 7^2 + 0^2 + 0^2 + 1^2 = √50
               ______________________    __
       |D1| = √ 8^2 + 0^2 + 0^2 + 1^2 = √65
    

    So the similarity should be (57 / √(50 * 65)) = 0.999846142, not 5.

    0 讨论(0)
提交回复
热议问题