Classification Report - Precision and F-score are ill-defined

前端 未结 2 634
面向向阳花
面向向阳花 2021-01-05 14:25

I imported classification_report from sklearn.metrics and when I enter my np.arrays as parameters I get the following error :

/usr/local/

相关标签:
2条回答
  • 2021-01-05 14:56

    This is not an error, just a warning that not all your labels are included in your y_pred, i.e. there are some labels in your y_test that your classifier never predicts.

    Here is a simple reproducible example:

    from sklearn.metrics import precision_score, f1_score, classification_report
    
    y_true = [0, 1, 2, 0, 1, 2] # 3-class problem
    y_pred = [0, 0, 1, 0, 0, 1] # we never predict '2'
    
    precision_score(y_true, y_pred, average='macro') 
    [...] UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. 
      'precision', 'predicted', average, warn_for)
    0.16666666666666666
    
    precision_score(y_true, y_pred, average='micro') # no warning
    0.3333333333333333
    
    precision_score(y_true, y_pred, average=None) 
    [...] UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. 
      'precision', 'predicted', average, warn_for)
    array([0.5, 0. , 0. ])
    

    Exact same warnings are produced for f1_score (not shown).

    Practically this only warns you that in the classification_report, the respective values for labels with no predicted samples (here 2) will be set to 0:

    print(classification_report(y_true, y_pred))
    
    
                  precision    recall  f1-score   support
    
               0       0.50      1.00      0.67         2
               1       0.00      0.00      0.00         2
               2       0.00      0.00      0.00         2
    
       micro avg       0.33      0.33      0.33         6
       macro avg       0.17      0.33      0.22         6
    weighted avg       0.17      0.33      0.22         6
    
    [...] UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. 
      'precision', 'predicted', average, warn_for)
    

    When I was not using np.array in the past it worked just fine

    Highly doubtful, since in the example above I have used simple Python lists, and not Numpy arrays...

    0 讨论(0)
  • 2021-01-05 15:06

    It means that some labels are only present in train data and some labels are only present in test dataset. Run the following codes, to understand the distribution of train and test labels.

    from collections import Counter
    Counter(y_train)
    Counter(y_test)
    

    Use stratified train_test_split to get rid of the situation where some labels are present only in test dataset.

    It might have worked in past simply because of the random splitting of dataset. Hence, stratified splitting is always recommended.

    The first situation is more about model fine tuning or choice of model.

    0 讨论(0)
提交回复
热议问题