Word2vec - get rank of similarity

心不动则不痛 提交于 2021-02-10 12:57:05

问题


Given I got a word2vec model (by gensim), I want to get the rank similarity between to words. For example, let's say I have the word "desk" and the most similar words to "desk" are:

  1. table 0.64
  2. chair 0.61
  3. book 0.59
  4. pencil 0.52

I want to create a function such that:

f(desk,book) = 3 Since book is the 3rd most similar word to desk. Does it exists? what is the most efficient way to do this?


回答1:


You can use the rank(entity1, entity2) to get the distance - same as the index.

model.wv.rank(sample_word, most_similar_word)

A separate function as given below won't be necessary here. Keeping it for information sake.

Assuming you have the list of words and their vectors in a list of tuples, returned by model.wv.most_similar(sample_word) as shown

[('table', 0.64), ('chair', 0.61), ('book', 0.59), ('pencil', 0.52)]

The following function accepts the sample word and the most similar word as params, and returns the index or rank (eg. [2]) if it's present in the output

def rank_of_most_similar_word(sample_word, most_similar_word):
    l = model.wv.most_similar(sample_word)
    return [x+1 for x, y in enumerate(l) if y[0] == most_similar_word]

sample_word = 'desk'
most_similar_word = 'book'
rank_of_most_similar_word(sample_word, most_similar_word)

Note: use topn=x to get the top x most similar words while using model.wv.most_similar(), as suggested in the comments.



来源:https://stackoverflow.com/questions/51747613/word2vec-get-rank-of-similarity

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!