Getting the accuracy for multi-label prediction in scikit-learn

后端 未结 1 1997
陌清茗
陌清茗 2021-02-01 05:56

In a multilabel classification setting, sklearn.metrics.accuracy_score only computes the subset accuracy (3): i.e. the set of labels predicted for a sample must exactly

相关标签:
1条回答
  • 2021-02-01 06:46

    You can write one version yourself, here is a example without considering the weight and normalize.

    import numpy as np
    
    y_true = np.array([[0,1,0],
                       [0,1,1],
                       [1,0,1],
                       [0,0,1]])
    
    y_pred = np.array([[0,1,1],
                       [0,1,1],
                       [0,1,0],
                       [0,0,0]])
    
    def hamming_score(y_true, y_pred, normalize=True, sample_weight=None):
        '''
        Compute the Hamming score (a.k.a. label-based accuracy) for the multi-label case
        http://stackoverflow.com/q/32239577/395857
        '''
        acc_list = []
        for i in range(y_true.shape[0]):
            set_true = set( np.where(y_true[i])[0] )
            set_pred = set( np.where(y_pred[i])[0] )
            #print('\nset_true: {0}'.format(set_true))
            #print('set_pred: {0}'.format(set_pred))
            tmp_a = None
            if len(set_true) == 0 and len(set_pred) == 0:
                tmp_a = 1
            else:
                tmp_a = len(set_true.intersection(set_pred))/\
                        float( len(set_true.union(set_pred)) )
            #print('tmp_a: {0}'.format(tmp_a))
            acc_list.append(tmp_a)
        return np.mean(acc_list)
    
    if __name__ == "__main__":
        print('Hamming score: {0}'.format(hamming_score(y_true, y_pred))) # 0.375 (= (0.5+1+0+0)/4)
    
        # For comparison sake:
        import sklearn.metrics
    
        # Subset accuracy
        # 0.25 (= 0+1+0+0 / 4) --> 1 if the prediction for one sample fully matches the gold. 0 otherwise.
        print('Subset accuracy: {0}'.format(sklearn.metrics.accuracy_score(y_true, y_pred, normalize=True, sample_weight=None)))
    
        # Hamming loss (smaller is better)
        # $$ \text{HammingLoss}(x_i, y_i) = \frac{1}{|D|} \sum_{i=1}^{|D|} \frac{xor(x_i, y_i)}{|L|}, $$
        # where
        #  - \\(|D|\\) is the number of samples  
        #  - \\(|L|\\) is the number of labels  
        #  - \\(y_i\\) is the ground truth  
        #  - \\(x_i\\)  is the prediction.  
        # 0.416666666667 (= (1+0+3+1) / (3*4) )
        print('Hamming loss: {0}'.format(sklearn.metrics.hamming_loss(y_true, y_pred))) 
    

    Outputs:

    Hamming score: 0.375
    Subset accuracy: 0.25
    Hamming loss: 0.416666666667
    
    0 讨论(0)
提交回复
热议问题