gensim word2vec accessing in/out vectors

后端 未结 4 657
一个人的身影
一个人的身影 2021-02-07 09:02

In the word2vec model, there are two linear transforms that take a word in vocab space to a hidden layer (the \"in\" vector), and then back to the vocab space (the \"out\" vecto

相关标签:
4条回答
  • 2021-02-07 09:10

    In the word2vec.py file you need to make this change In the following function it currently returns the "in" vector. As you want the "out" vector. The "in" is saved in syn0 object and "out" is saved in syn1neg object variable.

    def save_word2vec_format(self, fname, fvocab=None, binary=False):
      ....
      ....
      row = self.syn1neg[vocab.index]
    
    0 讨论(0)
  • 2021-02-07 09:16

    Below code will enable to save/load model. It uses pickle internally, optionally mmap‘ing the model’s internal large NumPy matrices into virtual memory directly from disk files, for inter-process memory sharing.

    model.save('/tmp/mymodel.model')
    new_model = gensim.models.Word2Vec.load('/tmp/mymodel')
    

    Some background information Gensim is a free Python library designed to process raw, unstructured digital texts (“plain text”). The algorithms in gensim, such as Latent Semantic Analysis, Latent Dirichlet Allocation and Random Projections discover semantic structure of documents by examining statistical co-occurrence patterns of the words within a corpus of training documents.

    Some good blog describing about the use and sample code base to kick start on the project

    • http://mccormickml.com/2016/04/12/googles-pretrained-word2vec-model-in-python/
    • https://rare-technologies.com/making-sense-of-word2vec/
    • https://rare-technologies.com/word2vec-tutorial/
    • https://rare-technologies.com/deep-learning-with-word2vec-and-gensim/

    Installation reference here

    Hope this helps!!!

    0 讨论(0)
  • 2021-02-07 09:21

    To get the syn1 of any word, this might work.

    model.syn1[model.wv.vocab['potato'].point]
    

    where model is your trained word2vec model.

    0 讨论(0)
  • 2021-02-07 09:30

    While this might not be a proper answer (can't comment yet) and noone pointed this out, take a look here. The creator seems to answer a similar question. Also that's the place where you have a higher chance for a valid answer.

    Digging around in the link he posted in the word2vec source code you could change the syn1 deletion to suit your needs. Just remember to delete it after you're done, since it proves to be a memory hog.

    0 讨论(0)
提交回复
热议问题