scikit weighted f1 score calculation and usage

余生长醉 提交于 2019-12-10 22:56:35

问题


I have a question regarding weighted average in sklearn.metrics.f1_score

sklearn.metrics.f1_score(y_true, y_pred, labels=None, pos_label=1, average='weighted', sample_weight=None)

Calculate metrics for each label, and find their average, weighted by support (the number of true instances for each label). This alters ‘macro’ to account for label imbalance; it can result in an F-score that is not between precision and recall.

First, if there is any reference that justifies the usage of weighted-F1, I am just curios in which cases I should use weighted-F1.

Second, I heard that weighted-F1 is deprecated, is it true?

Third, how actually weighted-F1 is being calculated, for example

{
    "0": {
        "TP": 2,
        "FP": 1,
        "FN": 0,
        "F1": 0.8
    },
    "1": {
        "TP": 0,
        "FP": 2,
        "FN": 2,
        "F1": -1
    },
    "2": {
        "TP": 1,
        "FP": 1,
        "FN": 2,
        "F1": 0.4
    }
}

How to calculate weighted-F1 of the above example. I though it should be something like (0.8*2/3 + 0.4*1/3)/3, however I was wrong.


回答1:


First, if there is any reference that justifies the usage of weighted-F1, I am just curios in which cases I should use weighted-F1.

I don't have any references, but if you're interested in multi-label classification where you care about precision/recall of all classes, then the weighted f1-score is appropriate. If you have binary classification where you just care about the positive samples, then it is probably not appropriate.

Second, I heard that weighted-F1 is deprecated, is it true?

No, weighted-F1 itself is not being deprecated. Only some aspects of the function interface were deprecated, back in v0.16, and then only to make it more explicit in previously ambiguous situations. (Historical discussion on github or check out the source code and search the page for "deprecated" to find details.)

Third, how actually weighted-F1 is being calculated?

From the documentation of f1_score:

``'weighted'``:
  Calculate metrics for each label, and find their average, weighted
  by support (the number of true instances for each label). This
  alters 'macro' to account for label imbalance; it can result in an
  F-score that is not between precision and recall.

So the average is weighted by the support, which is the number of samples with a given label. Because your example data above does not include the support, it is impossible to compute the weighted f1 score from the information you listed.



来源:https://stackoverflow.com/questions/33326810/scikit-weighted-f1-score-calculation-and-usage

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!