问题
When Keras 2.x removed certain metrics, the changelog said it did so because they were "Batch-based" and therefore not always accurate. What is meant by this? Do the corresponding metrics included in tensorflow suffer from the same drawbacks? For example: precision and recall metrics.
回答1:
Let's take precision for example. The stateless version which was removed was implemented like so:
def precision(y_true, y_pred):
"""Precision metric.
Only computes a batch-wise average of precision.
Computes the precision, a metric for multi-label classification of
how many selected items are relevant.
"""
true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
predicted_positives = K.sum(K.round(K.clip(y_pred, 0, 1)))
precision = true_positives / (predicted_positives + K.epsilon())
return precision
Which is fine if y_true
contains all of the labels in the dataset and y_pred
has the model's predictions corresponding to all of those labels.
The issue is that people often divide their datasets into batches, for example evaluating on 10000 images by running 10 evaluations of 1000 images. This can be necessary to fit memory constraints. In this case you'd get 10 different precision scores with no way to combine them.
Stateful metrics solve this issue by keeping intermediate values in variables which last for the whole evaluation. So in the case of precision
a stateful metric might have a persistent counter for true_positives
and predicted_positives
. TensorFlow metrics are stateful, e.g. tf.metrics.precision.
来源:https://stackoverflow.com/questions/51734955/keras-metrics-with-tf-backend-vs-tensorflow-metrics