I am interested in initialising tensorflow seq2seq implementation with pretrained word2vec.
I have seen the code. It seems embedding is initialized
with
You can change the tokanizer present in tensorflow/models/rnn/translate/data_utils.py
to use a pre-trained word2vec model for tokenizing. The lines 187-190
of data_utils.py
:
if tokenizer:
words = tokenizer(sentence)
else:
words = basic_tokenizer(sentence)
use basic_tokenizer
. You can write a tokenizer
method that uses a pre-trained word2vec model for tokenizing the sentences.