An Typeerror with VotingClassifier

后端未结

关注

 1  1416

I want to use VotingClassifier, but I have some problems with cross validating

    x_train, x_validation, y_train, y_validation = train_test_split(x, y, test


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  借酒劲吻你        
                
              
                            
                2021-01-16 20:56
              
            
            
                                                                       
This error is because of this line:

np.bincount(x, weights=self._weights_not_none)


Here x is the predictions returned by the individual classifiers inside the VotingClassifier.

According to the documentation of np.bincount:


  Count number of occurrences of each value in array of non-negative
  ints.
  
  x : array_like, 1 dimension, nonnegative ints


This method requires only int values in the array.

Now your code will work if you replace the CatBoostClassifier with any other Scikit-learn classifier. Because all scikit-learn estimators return array of np.int64 from their predict(). 

But CatBoostClassifier returns np.float64 as the output. And hence the error. Actually it should also return int64 because the predict() function should return the classes not any float values. But I dont know why it returns float.

You can correct this by extending the CatBoostClassifier class and converting the predictions on the fly. 

import numpy as np
from catboost import CatBoostClassifier
class CatBoostClassifierInt(CatBoostClassifier):
    def predict(self, data, prediction_type='Class', ntree_start=0, ntree_end=0, thread_count=1, verbose=None):
        predictions = self._predict(data, prediction_type, ntree_start, ntree_end, thread_count, verbose)

        # This line is the only change I did
        return np.asarray(predictions, dtype=np.int64).ravel()

clf1 = CatBoostClassifierInt()
clf2 = RandomForestClassifier()
clf = VotingClassifier(estimators=[('cb', clf1), ('rf', clf2)])
cross_validate(clf, x_train, y_train, scoring='accuracy', return_train_score = True)


Now you wont get that error.

More correct version should be this. This will handle all the types of labels with matching input and output and can be used in scikit with ease:

class CatBoostClassifierCorrected(CatBoostClassifier):
    def fit(self, X, y=None, cat_features=None, sample_weight=None, baseline=None, use_best_model=None,
        eval_set=None, verbose=None, logging_level=None, plot=False, column_description=None, verbose_eval=None):

        self.le_ = LabelEncoder().fit(y)
        transformed_y = self.le_.transform(y)

        self._fit(X, transformed_y, cat_features, None, sample_weight, None, None, None, baseline, use_best_model, eval_set, verbose, logging_level, plot, column_description, verbose_eval)
        return self

    def predict(self, data, prediction_type='Class', ntree_start=0, ntree_end=0, thread_count=1, verbose=None):
        predictions = self._predict(data, prediction_type, ntree_start, ntree_end, thread_count, verbose)

        # This line is the only change I did
        return self.le_.inverse_transform(predictions.astype(np.int64))


This will handle all different types of labels
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复