I am interested in initialising tensorflow seq2seq implementation with pretrained word2vec.
I have seen the code. It seems embedding is initialized
with
I think you've gotten your answer in the mailing list but I am putting it here for posterity.
https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/bH6S98NpIJE
You can initialize it randomly and afterwards do: session.run(embedding.assign(my_word2vec_matrix))
This will override the init values.
This seems to work for me. I believe trainable=False
is needed to keep the values fixed?
# load word2vec model (say from gensim)
model = load_model(FILENAME, binary=True)
# embedding matrix
X = model.syn0
print(type(X)) # numpy.ndarray
print(X.shape) # (vocab_size, embedding_dim)
# start interactive session
sess = tf.InteractiveSession()
# set embeddings
embeddings = tf.Variable(tf.random_uniform(X.shape, minval=-0.1, maxval=0.1), trainable=False)
# initialize
sess.run(tf.initialize_all_variables())
# override inits
sess.run(embeddings.assign(X))