initialising Seq2seq embedding with pretrained word2vec

前端未结

关注

 2  559

I am interested in initialising tensorflow seq2seq implementation with pretrained word2vec.

I have seen the code. It seems embedding is initialized

with


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  一生所求        
                
              
                            
                2021-02-09 04:06
              
            
            
                                                                       
I think you've gotten your answer in the mailing list but I am putting it here for posterity.

https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/bH6S98NpIJE


  You can initialize it randomly and afterwards do:
  session.run(embedding.assign(my_word2vec_matrix))
  
  This will override the init values.


This seems to work for me. I believe trainable=False is needed to keep the values fixed?

# load word2vec model (say from gensim)
model = load_model(FILENAME, binary=True)

# embedding matrix
X = model.syn0
print(type(X)) # numpy.ndarray
print(X.shape) # (vocab_size, embedding_dim)

# start interactive session
sess = tf.InteractiveSession()

# set embeddings
embeddings = tf.Variable(tf.random_uniform(X.shape, minval=-0.1, maxval=0.1), trainable=False)

# initialize
sess.run(tf.initialize_all_variables())

# override inits
sess.run(embeddings.assign(X))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  庸人自扰        
                
              
                            
                2021-02-09 04:06
              
            
            
                                                                       
You can change the tokanizer present in tensorflow/models/rnn/translate/data_utils.py to use a pre-trained word2vec model for tokenizing. The lines 187-190 of data_utils.py:

if tokenizer:
    words = tokenizer(sentence)
else:
    words = basic_tokenizer(sentence)


use basic_tokenizer. You can write a tokenizer method that uses a pre-trained word2vec model for tokenizing the sentences.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复