An Typeerror with VotingClassifier

后端 未结 1 1416
盖世英雄少女心
盖世英雄少女心 2021-01-16 20:21

I want to use VotingClassifier, but I have some problems with cross validating

    x_train, x_validation, y_train, y_validation = train_test_split(x, y, test         


        
相关标签:
1条回答
  • 2021-01-16 20:56

    This error is because of this line:

    np.bincount(x, weights=self._weights_not_none)
    

    Here x is the predictions returned by the individual classifiers inside the VotingClassifier.

    According to the documentation of np.bincount:

    Count number of occurrences of each value in array of non-negative ints.

    x : array_like, 1 dimension, nonnegative ints

    This method requires only int values in the array.

    Now your code will work if you replace the CatBoostClassifier with any other Scikit-learn classifier. Because all scikit-learn estimators return array of np.int64 from their predict().

    But CatBoostClassifier returns np.float64 as the output. And hence the error. Actually it should also return int64 because the predict() function should return the classes not any float values. But I dont know why it returns float.

    You can correct this by extending the CatBoostClassifier class and converting the predictions on the fly.

    import numpy as np
    from catboost import CatBoostClassifier
    class CatBoostClassifierInt(CatBoostClassifier):
        def predict(self, data, prediction_type='Class', ntree_start=0, ntree_end=0, thread_count=1, verbose=None):
            predictions = self._predict(data, prediction_type, ntree_start, ntree_end, thread_count, verbose)
    
            # This line is the only change I did
            return np.asarray(predictions, dtype=np.int64).ravel()
    
    clf1 = CatBoostClassifierInt()
    clf2 = RandomForestClassifier()
    clf = VotingClassifier(estimators=[('cb', clf1), ('rf', clf2)])
    cross_validate(clf, x_train, y_train, scoring='accuracy', return_train_score = True)
    

    Now you wont get that error.

    More correct version should be this. This will handle all the types of labels with matching input and output and can be used in scikit with ease:

    class CatBoostClassifierCorrected(CatBoostClassifier):
        def fit(self, X, y=None, cat_features=None, sample_weight=None, baseline=None, use_best_model=None,
            eval_set=None, verbose=None, logging_level=None, plot=False, column_description=None, verbose_eval=None):
    
            self.le_ = LabelEncoder().fit(y)
            transformed_y = self.le_.transform(y)
    
            self._fit(X, transformed_y, cat_features, None, sample_weight, None, None, None, baseline, use_best_model, eval_set, verbose, logging_level, plot, column_description, verbose_eval)
            return self
    
        def predict(self, data, prediction_type='Class', ntree_start=0, ntree_end=0, thread_count=1, verbose=None):
            predictions = self._predict(data, prediction_type, ntree_start, ntree_end, thread_count, verbose)
    
            # This line is the only change I did
            return self.le_.inverse_transform(predictions.astype(np.int64))
    

    This will handle all different types of labels

    0 讨论(0)
提交回复
热议问题