Vector Space Model - query vector [0, 0.707, 0.707] calculated

浪子不回头ぞ 提交于 2019-12-13 07:53:43

问题


I'm reading the book "Introduction to Information Retrieval "(Christopher Manning) and I'm stuck on the chapter 6 when it introduces the query "jealous gossip" for which it indicated that the vector unit associated is [0, 0.707, 0.707] ( https://nlp.stanford.edu/IR-book/html/htmledition/queries-as-vectors-1.html ) considering the terms affect, jealous and gossip. I tried to calculate it by computing the tfidf assuming that: - Tf is equal to 1 for jealous and gossip - Idf is always equal to 0 if we calculate it as log(N/df) with N=1(I have only 1 query and it is my document), df=1 for jealous and gossip => log(1)=0 Since the idf is 0, it turns out that the tfidf is 0. So I decided to compute every weight of the query vector with the raw tf divided by the euclidean length. In this case the Euclidean length is sqrt(1+1)=1. I can't obtain the formula by which it decided that [0, 0.707, 0.707] is the query vector. Can someone help me?


回答1:


I haven't worked through the problem, but I think the issue might be that sqrt(1+1) is sqrt(2), so when you normalize, each of the 1s become 1/sqrt(2) = 0.707.



来源:https://stackoverflow.com/questions/53585068/vector-space-model-query-vector-0-0-707-0-707-calculated

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!