问题
I'm reading the book "Introduction to Information Retrieval "(Christopher Manning) and I'm stuck on the chapter 6 when it introduces the query "jealous gossip" for which it indicated that the vector unit associated is [0, 0.707, 0.707] ( https://nlp.stanford.edu/IR-book/html/htmledition/queries-as-vectors-1.html ) considering the terms affect, jealous and gossip. I tried to calculate it by computing the tfidf assuming that: - Tf is equal to 1 for jealous and gossip - Idf is always equal to 0 if we calculate it as log(N/df) with N=1(I have only 1 query and it is my document), df=1 for jealous and gossip => log(1)=0 Since the idf is 0, it turns out that the tfidf is 0. So I decided to compute every weight of the query vector with the raw tf divided by the euclidean length. In this case the Euclidean length is sqrt(1+1)=1. I can't obtain the formula by which it decided that [0, 0.707, 0.707] is the query vector. Can someone help me?
回答1:
I haven't worked through the problem, but I think the issue might be that sqrt(1+1)
is sqrt(2)
, so when you normalize, each of the 1s become 1/sqrt(2) = 0.707
.
来源:https://stackoverflow.com/questions/53585068/vector-space-model-query-vector-0-0-707-0-707-calculated