about cosine similarity

╄→尐↘猪︶ㄣ 提交于 2019-12-01 11:08:41

问题


I am finding cosine similarity between documents.. I did it like this

D1=(8,0,0,1) where 8,0,0,1 are the tf-idf scores of the terms t1, t2, t3 , t4

D2=(7,0,0,1)

cos(theta) = (56 + 0 + 0 + 1) / sqrt(64 + 49) sqrt(1 +1 )

which comes out to be

cos(theta)= 5

Now what do I evaluate from this value... I don't get it what does cos(theta)=5 signify about the similarity between them... Am I doing things right?


回答1:


The denominator is wrong.

The cosine similarity is defined as

         D1 · D2
 sim = ———————————
        |D1| |D2|

Here

D1 · D2 = (7*8 + 0*0 + 0*0 + 1*1) = 57
           ______________________    __
   |D2| = √ 7^2 + 0^2 + 0^2 + 1^2 = √50
           ______________________    __
   |D1| = √ 8^2 + 0^2 + 0^2 + 1^2 = √65

So the similarity should be (57 / √(50 * 65)) = 0.999846142, not 5.



来源:https://stackoverflow.com/questions/2859970/about-cosine-similarity

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!