word-embedding

Sentence embedding in keras

时光怂恿深爱的人放手 提交于 2020-01-23 03:59:25
问题 I am trying a simple document classification using sentence embeddings in keras. I know how to feed word vectors to a network, but I have problems using sentence embeddings. In my case, I have a simple representation of sentences (adding the word vectors along the axis, for example np.sum(sequences, axis=0) ). My question is, what should I replace the Embedding layer with in the code below to feed sentence embeddings instead? model = Sequential() model.add(Embedding(len(embedding_weights),

Assign custom word vector to UNK token during prediction?

匆匆过客 提交于 2020-01-16 18:00:10
问题 I use a tensorflow embedding layer for a classification model like with with tf.variable_scope('embeddings'): word_embeddings = tf.constant(self.embedding_mat, dtype=tf.float32, name="embedding") self.embedded_x1 = tf.nn.embedding_lookup(word_embeddings, self.x1) self.embedded_x2 = tf.nn.embedding_lookup(word_embeddings, self.x2) If I have a UNK token in my embedding matrix but I did not used this UNK in training, can I assign a custom vector (eg. from fasttext) during prediction? So, for

Getting error while adding embedding layer to lstm autoencoder

江枫思渺然 提交于 2019-12-31 01:52:10
问题 I have a seq2seq model which is working fine. I want to add an embedding layer in this network which I faced with an error. this is my architecture using pretrained word embedding which is working fine(Actually the code is almost the same code available here, but I want to include the Embedding layer in the model rather than using the pretrained embedding vectors): LATENT_SIZE = 20 inputs = Input(shape=(SEQUENCE_LEN, EMBED_SIZE), name="input") encoded = Bidirectional(LSTM(LATENT_SIZE), merge

What embedding-layer output_dim is really needed for a dictionary of just 10000 words?

丶灬走出姿态 提交于 2019-12-24 03:04:30
问题 I'm training up an RNN with a very reduced set of word features, around 10,000. I was planning on starting with an embedding layer before adding RNNs, but it is very unclear to me what dimensionality is really needed. I know that I can try out different values (32, 64, etc.), but I'd rather have some intuition going into it first. For example, if I use a 32-dimensional embedding vector, then only 3 different values are needed per dimension to fully describe the space ( 32**3>>10000 ).

What embedding-layer output_dim is really needed for a dictionary of just 10000 words?

£可爱£侵袭症+ 提交于 2019-12-24 03:03:16
问题 I'm training up an RNN with a very reduced set of word features, around 10,000. I was planning on starting with an embedding layer before adding RNNs, but it is very unclear to me what dimensionality is really needed. I know that I can try out different values (32, 64, etc.), but I'd rather have some intuition going into it first. For example, if I use a 32-dimensional embedding vector, then only 3 different values are needed per dimension to fully describe the space ( 32**3>>10000 ).

gensim word2vec - updating word embeddings with newcoming data

冷暖自知 提交于 2019-12-23 04:52:51
问题 I have trained 26 million tweets with skipgram technique to create word embeddings as follows: sentences = gensim.models.word2vec.LineSentence('/.../data/tweets_26M.txt') model = gensim.models.word2vec.Word2Vec(sentences, window=2, sg=1, size=200, iter=20) model.save_word2vec_format('/.../savedModel/Tweets26M_All.model.bin', binary=True) However, I am continuously collecting more tweets in my database. For example, when I have 2 million more tweets, I wanna update my embeddings with also

Bigram to a vector

血红的双手。 提交于 2019-12-21 12:28:10
问题 I want to construct word embeddings for documents using word2vec tool. I know how to find a vector embedding corresponding to a single word(unigram). Now, I want to find a vector for a bigram. Is it possible to do using word2vec? If yes, how? 回答1: The following snippet will get you the vector representation of a bigram. Note that the bigram you want to convert to a vector needs to have an underscore instead of a space between the words, e.g. bigram2vec(unigrams, "this report") is wrong, it

How to make the tensorflow hub embeddings servable using tensorflow serving?

南楼画角 提交于 2019-12-21 11:04:07
问题 I am trying use an embeddings module from tensorflow hub as servable. I am new to tensorflow. Currently, I am using Universal Sentence Encoder embeddings as a lookup to convert sentences to embeddings and then using those embeddings to find a similarity to another sentence. My current code to convert sentences into embeddings is: with tf.Session() as session: session.run([tf.global_variables_initializer(), tf.tables_initializer()]) sen_embeddings = session.run(self.embed(prepared_text))

Visualize Gensim Word2vec Embeddings in Tensorboard Projector

梦想与她 提交于 2019-12-20 21:54:25
问题 I've only seen a few questions that ask this, and none of them have an answer yet, so I thought I might as well try. I've been using gensim's word2vec model to create some vectors. I exported them into text, and tried importing it on tensorflow's live model of the embedding projector. One problem. It didn't work . It told me that the tensors were improperly formatted. So, being a beginner, I thought I would ask some people with more experience about possible solutions. Equivalent to my code:

Visualize Gensim Word2vec Embeddings in Tensorboard Projector

十年热恋 提交于 2019-12-20 21:53:03
问题 I've only seen a few questions that ask this, and none of them have an answer yet, so I thought I might as well try. I've been using gensim's word2vec model to create some vectors. I exported them into text, and tried importing it on tensorflow's live model of the embedding projector. One problem. It didn't work . It told me that the tensors were improperly formatted. So, being a beginner, I thought I would ask some people with more experience about possible solutions. Equivalent to my code: