问题
from gensim.models import word2vec
sentences = word2vec.Text8Corpus('TextFile')
model = word2vec.Word2Vec(sentences, size=200, min_count = 2, workers = 4)
print model['king']
Is the output vector the context vector of 'king' or the word embedding vector of 'king'? How can I get both context vector of 'king' and the word embedding vector of 'king'? Thanks!
回答1:
It is the embedding vector for 'king'.
If you use hierarchical softmax, the context vectors are:
model.syn1
and if you use negative sampling they are:
model.syn1neg
The vectors can be accessed by:
model.syn1[model.vocab[word].index]
回答2:
'Context vector' is also a 'word embedding' vector. Word embedding means how vocabulary are mapped to vectors of real numbers.
I assume you meant center word's vector when you said 'word embedding' vector.
In word2vec algorithm, when you train the model, it creates two different vectors for one word (when 'king' is used for center word and when it's used for context words.)
I don't know about how gensim is treating these two vectors, but normally, people average both context and center words, or concatinate two vectors. It might not be the most beautiful way to treat the vectors, but it works very well that way.
So when you call model['king'] on some pre-trained vector, the vector you see is probably the averaged version of two vectors.
来源:https://stackoverflow.com/questions/39406092/how-to-get-both-the-word-embeddings-vector-and-context-vector-of-a-given-word-by