How to generate a custom cross-validation generator in scikit-learn?

前端未结
关注
 4  1776
萌比男神i 2021-01-31 20:08
I have an unbalanced dataset, so I have an strategy for oversampling that I only apply during training of my data. I\'d like to use classes of scikit-learn like GridSearch

      
      
        
          4条回答        

        
                    
            
            
                         
                
              
              
                
                   温柔的废话
                                             
                
                
                (楼主)
            
              
              
                2021-01-31 20:33
              

            
            
                        
Scikit-Learn provides a workaround for this, with their Label k-fold iterator:


  LabelKFold is a variation of k-fold which ensures that the same label is not in both testing and training sets. This is necessary for example if you obtained data from different subjects and you want to avoid over-fitting (i.e., learning person specific features) by testing and training on different subjects.


To use this iterator in a case of oversampling, first, you can create a column in your dataframe (e.g. cv_label) which stores the index values of each row.

df['cv_label'] = df.index


Then, you can apply your oversampling, making sure you copy the cv_label column in the oversampling as well. This column will contain duplicate values for the oversampled data. You can create a separate series or list from these labels for handling later:

cv_labels = df['cv_label']


Be aware that you will need to remove this column from your dataframe before running your cross-validator/classifier.

After separating your data into features (not including cv_label) and labels, you create the LabelKFold iterator and run the cross validation function you need with it:

clf = svm.SVC(C=1)
lkf = LabelKFold(cv_labels, n_folds=5)
predicted = cross_validation.cross_val_predict(clf, features, labels, cv=lkf)

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它4个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复