VotingClassifier: Different Feature Sets

后端未结

关注

 2  840

不要未来只要你来 2021-02-02 16:25

I have two different feature sets (so, with same number of rows and the labels are the same), in my case DataFrames:

df1:

| A


      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   长发绾君心
                                             
                
                
                (楼主)
            
              
              
                2021-02-02 16:41
              

            
            
                        
Its pretty easy to make custom functions to do what you want to achieve.

Import the prerequisites:

import numpy as np
from sklearn.preprocessing import LabelEncoder

def fit_multiple_estimators(classifiers, X_list, y, sample_weights = None):

    # Convert the labels `y` using LabelEncoder, because the predict method is using index-based pointers
    # which will be converted back to original data later.
    le_ = LabelEncoder()
    le_.fit(y)
    transformed_y = le_.transform(y)

    # Fit all estimators with their respective feature arrays
    estimators_ = [clf.fit(X, y) if sample_weights is None else clf.fit(X, y, sample_weights) for clf, X in zip([clf for _, clf in classifiers], X_list)]

    return estimators_, le_


def predict_from_multiple_estimator(estimators, label_encoder, X_list, weights = None):

    # Predict 'soft' voting with probabilities

    pred1 = np.asarray([clf.predict_proba(X) for clf, X in zip(estimators, X_list)])
    pred2 = np.average(pred1, axis=0, weights=weights)
    pred = np.argmax(pred2, axis=1)

    # Convert integer predictions to original labels:
    return label_encoder.inverse_transform(pred)


The logic is taken from VotingClassifier source.

Now test the above methods. 
First get some data:

from sklearn.datasets import load_iris
data = load_iris()
X = data.data
y = []

#Convert int classes to string labels
for x in data.target:
    if x==0:
        y.append('setosa')
    elif x==1:
        y.append('versicolor')
    else:
        y.append('virginica')


Split the data into train and test:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)


Divide the X into different feature datas:

X_train1, X_train2 = X_train[:,:2], X_train[:,2:]
X_test1, X_test2 = X_test[:,:2], X_test[:,2:]

X_train_list = [X_train1, X_train2]
X_test_list = [X_test1, X_test2]


Get list of classifiers:

from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC

# Make sure the number of estimators here are equal to number of different feature datas
classifiers = [('knn',  KNeighborsClassifier(3)),
    ('svc', SVC(kernel="linear", C=0.025, probability=True))]


Fit the classifiers with the data:

fitted_estimators, label_encoder = fit_multiple_estimators(classifiers, X_train_list, y_train)


Predict using the test data:

y_pred = predict_from_multiple_estimator(fitted_estimators, label_encoder, X_test_list)


Get accuracy of predictions:

from sklearn.metrics import accuracy_score
print(accuracy_score(y_test, y_pred))


Feel free to ask if any doubt.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复