How to save the tensorflow's word2vec in text/binary file for later use of kNN output?

别等时光非礼了梦想. 提交于 2019-12-11 07:25:33

问题


I have trained a word2vec model in tensorflow. But when I save the session, it only outputted model.ckpt.data / .index / .meta files.

I was thinking of implementing KNN method in retrieving nearest words. I saw answers of using gensim, but how can I save my tensorflow word2vec model into .txt first?


回答1:


Simply evaluate the embeddings matrix into a numpy array and write it to the file along with resolved words. Sample code:

vocabulary_size = 50000
embedding_size = 128

# Assume your word to index map
word_to_idx = { ... }
# Assume your embeddings variable
embeddings = tf.Variable(tf.random_uniform([vocabulary_size, embedding_size],0,1))

with tf.Session() as sess:
  embeddings_val = sess.run(embeddings)
  with open('embeddings.txt', 'w') as file_:
    for i in range(vocabulary_size):
      embed = embeddings_val[i, :]
      word = word_to_idx[i]
      file_.write('%s %s\n' % (word, ' '.join(map(str, embed))))



回答2:


I just had the same problem and tried Maxim's solution.

You need to replace the line:

word = word_to_idx[i]

with

word = idx_to_word[i]

You can simply reverse the word_to_idx-dictionary with the following code:

idx_to_word = dict(zip(word_to_idx.values(), word_to_idx.keys()))

Except for that, his solution works fine.



来源:https://stackoverflow.com/questions/47873938/how-to-save-the-tensorflows-word2vec-in-text-binary-file-for-later-use-of-knn-o

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!