How do I classify documents with SciKitLearn using TfIdfVectorizer?

后端未结

关注

 2  1966

你的背包 2021-02-11 02:44

The following example shows how one can train a classifier with the Sklearn 20 newsgroups data.

>>> from sklearn.feature_extraction.text import TfidfVec


      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   借酒劲吻你
                                             
                
                
                (楼主)
            
              
              
                2021-02-11 03:10
              

            
            
                        
In general, for sklearn the flow is:


Convert your string data to numeric values usinf some vectorizer for e.g. TfIDF,count etcs
fit and transform
Pass it to train/fit of your choice of classifier.


You did not mention your data format but if it is csv file with some rows,flow could be:


Read each row of text
Pre process, like remove the stop words etc.
raw_data_list = [row1,row2,rown...]
vectorizer = TfidfVectorizer()
x_transformed = vectorizer.fit_transform(raw_data_list) 
x_transformed can be passed to fit/train function of classifiers.


And once you have trained classifier you can call predict for new data.
Remeber to convert new data to same format as data on which you trained by using above used and fitted vectorizer before passing it to classif.predict.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复