I am finding cosine similarity between documents.. I did it like this
D1=(8,0,0,1) where 8,0,0,1 are the tf-idf scores of the terms t1, t2, t3 , t4
D2=(7,0,
The denominator is wrong.
The cosine similarity is defined as
D1 · D2
sim = ———————————
|D1| |D2|
Here
D1 · D2 = (7*8 + 0*0 + 0*0 + 1*1) = 57
______________________ __
|D2| = √ 7^2 + 0^2 + 0^2 + 1^2 = √50
______________________ __
|D1| = √ 8^2 + 0^2 + 0^2 + 1^2 = √65
So the similarity should be (57 / √(50 * 65)) = 0.999846142, not 5.