Wor2vec fine-tuning

前端未结

关注

 3  666

孤城傲影 2021-01-07 01:41

I need to fine-tune my word2vec model. I have two datasets, data1 and data2.

What I did so far is:

model = gensim.models.Word


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   礼貌的吻别
                                             
                
                
                (楼主)
            
              
              
                2021-01-07 02:05
              

            
            
                        

Is this correct?

Yes, it is. You need to make sure that data2's words in vocabulary provided by data1. If it isn't the words - that isn't presented in vocabulary - will be lost.
Note that the weights that will be computed by
model.train(data1, total_examples=len(data1), epochs=epochs)
and
model.train(data2, total_examples=len(data2), epochs=epochs)
isn't equal to
model.train(data1+data2, total_examples=len(data1+data2), epochs=epochs)

Do I need to store learned weights somewhere?

No, you don't need to.
But if you want you can save weights as a file so you can use them later.
model.save("word2vec.model")

And you load them by
model = Word2Vec.load("word2vec.model")

(source)

I need to fine tune my word2vec model.

Note that "Word2vec training is an unsupervised task, there’s no good way to objectively evaluate the result. Evaluation depends on your end application." (source) But there's some evaluations that you can look-up here ("How to measure quality of the word vectors" section)
Hope that helps!
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复