How to calculate the sentence similarity using word2vec model of gensim with python

后端 未结 14 1218
一向
一向 2020-11-28 00:31

According to the Gensim Word2Vec, I can use the word2vec model in gensim package to calculate the similarity between 2 words.

e.g.

trained_model.simi         


        
相关标签:
14条回答
  • 2020-11-28 01:23

    you can use Word Mover's Distance algorithm. here is an easy description about WMD.

    #load word2vec model, here GoogleNews is used
    model = gensim.models.KeyedVectors.load_word2vec_format('../GoogleNews-vectors-negative300.bin', binary=True)
    #two sample sentences 
    s1 = 'the first sentence'
    s2 = 'the second text'
    
    #calculate distance between two sentences using WMD algorithm
    distance = model.wmdistance(s1, s2)
    
    print ('distance = %.3f' % distance)
    

    P.s.: if you face an error about import pyemd library, you can install it using following command:

    pip install pyemd
    
    0 讨论(0)
  • 2020-11-28 01:24

    I have tried the methods provided by the previous answers. It works, but the main drawback of it is that the longer the sentences the larger similarity will be(to calculate the similarity I use the cosine score of the two mean embeddings of any two sentences) since the more the words the more positive semantic effects will be added to the sentence.

    I thought I should change my mind and use the sentence embedding instead as studied in this paper and this.

    0 讨论(0)
提交回复
热议问题