How to use pretrained Word2Vec model in Tensorflow

让人想犯罪 __ 提交于 2019-12-10 04:01:52

问题


I have a Word2Vec model which is trained in Gensim. How can I use it in Tensorflow for Word Embeddings. I don't want to train Embeddings from scratch in Tensorflow. Can someone tell me how to do it with some example code?


回答1:


Let's assume you have a dictionary and inverse_dict list, with index in list corresponding to most common words:

vocab = {'hello': 0, 'world': 2, 'neural':1, 'networks':3}
inv_dict = ['hello', 'neural', 'world', 'networks']

Notice how the inverse_dict index corresponds to the dictionary values. Now declare your embedding matrix and get the values:

vocab_size = len(inv_dict)
emb_size = 300 # or whatever the size of your embeddings
embeddings = np.zeroes((vocab_size, emb_size))

from gensim.models.keyedvectors import KeyedVectors                         
model = KeyedVectors.load_word2vec_format('embeddings_file', binary=True)

for k, v in vocab.items():
  embeddings[v] = model[k]

You've got your embeddings matrix. Good. Now let's assume you want to train on the sample: x = ['hello', 'world']. But this doesn't work for our neural net. We need to integerize:

x_train = []
for word in x:  
  x_train.append(vocab[word]) # integerize
x_train = np.array(x_train) # make into numpy array

Now we are good to go with embedding our samples on-the-fly

x_model = tf.placeholder(tf.int32, shape=[None, input_size])
with tf.device("/cpu:0"):
  embedded_x = tf.nn.embedding_lookup(embeddings, x_model)

Now embedded_x goes into your convolution or whatever. I am also assuming you are not retraining the embeddings, but simply using them. Hope that helps



来源:https://stackoverflow.com/questions/43070656/how-to-use-pretrained-word2vec-model-in-tensorflow

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!