I am wondering how does Keras compute a metric (a custom one or not).
For example, suppose I have the following metric which yields the maximal error between the predict
There is a difference between the metric on training dataset and on validation dataset. For the val set the metric is calculated at epoch end for your whole val dataset. For the train set: The metric is calculated on batch end and the average keeps getting updated till epochs end.
As you can see the metric for the train set is evaluated on the fly with each batch was evaluated using different weights. That's why the train metric shows sometimes strange behaviour.