Scoring metrics from Keras scikit-learn wrapper in cross validation with one-hot encoded labels

后端未结

关注

 3  982

I am implementing a neural network and I would like to assess its performance with cross validation. Here is my current code:

def recall_m(y_true, y_pred):
    t


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  青春惊慌失措        
                
              
                            
                2021-01-26 08:09
              
            
            
                                                                       
cross_val_score is not the appropritate tool here; you should take manual control of your CV procedure. Here is how, combining aspects from my answer in the SO thread you have linked, as well as from Cross-validation metrics in scikit-learn for each data split, and using accuracy just as an example metric:

from sklearn.model_selection import KFold
from sklearn.metrics import accuracy_score
import numpy as np

n_splits = 10
kf = KFold(n_splits=n_splits, shuffle=True)
cv_acc = []

# prepare a single-digit copy of your 1-hot encoded true labels:
y_single = np.argmax(y, axis=1)

for train_index, val_index in kf.split(x):
    # fit & predict
    model = KerasClassifier(build_fn=build_model, batch_size=10, epochs=ep)
    model.fit(x[train_index], y[train_index])
    pred = model.predict_classes(x[val_index]) # predicts single-digit classes

    # get fold accuracy & append
    fold_acc = accuracy_score(y_single[val_index], pred)
    cv_acc.append(fold_acc)

acc = mean(cv_acc)


At completion of the loop, you will have the accuracies of each fold in the list cv_acc, and taking the mean will give you the average value.

This way, you don't need the custom definitions you use for precision, recall, and f1; you can just use the respective ones from scikit-learn. You can add as many different metrics you want in the loop (something you cannot do with cross_cal_score), as long as you import them appropriately from scikit-learn as done here with accuracy_score.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  无人及你        
                
              
                            
                2021-01-26 08:20
              
            
            
                                                                       
For anybody still wanting to use cross_validate with one-hot encoded labels. This is a more scikit oriented way to go about it.
X, y = get_data()
# in my application I have words as labels, so y is a np.array with strings
encoder = LabelEncoder()
y_encoded = encoder.fit_transform(y)

# build a version of the scoring metrics for multi-class and one-hot encoding predictions
labels = sorted(set(np.unique(y_encoded)) - set(encoder.transform(['nan'])))

# these functions compare y (one-hot encoded) to y_pred (integer encoded)
# by making y integer encoded as well

def f1_categorical(y, y_pred, **kwargs):
    return f1_score(y.argmax(1), y_pred, **kwargs)

def precision_categorical(y, y_pred, **kwargs):
    return precision_score(y.argmax(1), y_pred, **kwargs)

def recall_categorical(y, y_pred, **kwargs):
    return recall_score(y.argmax(1), y_pred, **kwargs)

def accuracy_categorical(y, y_pred, **kwargs):
    return accuracy_score(y.argmax(1), y_pred, **kwargs)

# Wrap the functions abobe with `make_scorer` 
# (here I chose the micro average because it worked for my multi-class application)
our_f1 = make_scorer(f1_categorical, labels=labels, average="micro")
our_precision = make_scorer(precision_categorical, labels=labels, average="micro")
our_recall = make_scorer(recall_categorical, labels=labels, average="micro")
aur_accuracy = make_scorer(accuracy_categorical)
scoring = {
    'accuracy':aur_accuracy,
    'f1':our_f1,
    'precision':our_precision,
    'recall':our_recall
}

# one-hot encoding
y_categorical = tf.keras.utils.to_categorical(y_encoded)

# keras wrapper
estimator = tf.keras.wrappers.scikit_learn.KerasClassifier(
                build_fn=model_with_one_hot_encoded_output,
                epochs=1,
                batch_size=32,
                verbose=1)

# cross validate as usual
results = cross_validate(estimator, 
                         X_scaled, y_categorical, 
                         scoring=scoring)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  予麋鹿        
                
              
                            
                2021-01-26 08:31
              
            
            
                                                                       
I've been experimenting with @desertnaut 's answer however because I have a multi class problem, I experienced problems not even with the loop itself but the np.argmax() line. After googling I did not find any way to resolve it easily so I ended up (on this user's advice) implementing CV by hand. It was a bit more complicated because I am using a pandas dataframe (and the code can definitely be cleaned up further) but here is the working code:

ep = 120
df_split = np.array_split(df, 10)
test_part = 0
acc = []
f1 = []
prec = []
recalls = []
while test_part < 10:
    model = build_model()
    train_x = []
    train_y = []
    test_x = []
    test_y = []
    print("CV Fold, with test partition i = " , test_part)

    for i in range(10):
        #on first iter that isnt a test part then set the train set to this 
        if len(train_x) == 0 and not i == test_part:
            train_x = df_split[i][['start-sin', 'start-cos', 'start-sin-lag', 'start-cos-lag', 'prev-close-sin', 'prev-close-cos', 'prev-length', 'state-lag', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday']]
            train_y = df_split[i][['wait-categ-none', 'wait-categ-short', 'wait-categ-medium', 'wait-categ-long']]
            #terminate immediately
            continue
        #if current is not a test partition then concat with previous version
        if not i == test_part:
            train_x = pd.concat([train_x, df_split[i][['start-sin', 'start-cos', 'start-sin-lag', 'start-cos-lag', 'prev-close-sin', 'prev-close-cos', 'prev-length', 'state-lag', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday']]], axis=0)
            train_y = pd.concat([train_y, df_split[i][['wait-categ-none', 'wait-categ-short', 'wait-categ-medium', 'wait-categ-long']]], axis=0)

        #set this to test partition
        else:
            test_x = df_split[i][['start-sin', 'start-cos', 'start-sin-lag', 'start-cos-lag', 'prev-close-sin', 'prev-close-cos', 'prev-length', 'state-lag', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday']]
            test_y = df_split[i][['wait-categ-none', 'wait-categ-short', 'wait-categ-medium', 'wait-categ-long']]
    #enforce
    train_y = train_y.replace(False, 0)
    train_y = train_y.replace(True, 1)
    test_y = test_y.replace(False, 0)
    test_y = test_y.replace(True, 1)
    #fit
    model.fit(train_x, train_y, epochs=ep, verbose=1)
    pred = model.predict(test_x)
    #score
    loss, accuracy, f1_score, precision, recall = model.evaluate(test_x, test_y, verbose=0)
    #save
    acc.append(accuracy)
    f1.append(f1_score)
    prec.append(precision)
    recalls.append(recall)
    test_part += 1
print("CV finished.\n")

print("Mean Accuracy")
print(sum(acc)/len(acc))
print("Mean F1 score")
print(sum(f1)/len(f1))
print("Mean Precision")
print(sum(prec)/len(prec))
print("Mean Recall rate")
print(sum(recalls)/len(recalls))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复