问题
I know that MAPE and WMAPE as a forecast error metrics, they have some benefits. But what's the gaps? Someone says:
For MAPE: "Combinations with very small or zero volumes can cause large skew in results" And for WMAPE: "Combinations with large weights can skew the results in their favor"
I can't understand, can anyone explain the two statements for the weakness of the two metrics? Thanks.
回答1:
For MAPE, Mean absolute percentage error [1], suppose we denote the actual value with A, and predicted value with P. You have a series of data at time 1 thru n, then
MAPE = 100/n * ( Sum of |(A(t) - P(t))/A(t)| ), for t in 1..n
where A(t) is the actual value at time t, P(t) is the predicted value at time t.
Since A(t) is in the denominator, whenever you have a very small or near-zero A(t), that division is like one divided by zero which creates very large changes in the Absolute Percentage Error. Combinations of such large changes causes large skew in results for sure.
For WMAPE, Weighted mean absolute percentage error,
Sum of |(A(t) - P(t))/A(t)| * W(t)
WMPAE = -------------------------------------, for t in 1..n
Sum of W(t)
where W(t) is the weight you associate with the prediction at time t.
Since this is a weighted measure, it does not have the same problems as MAPE, e.g., over-skewing due to very small or zero volumes.
However, a weighting factor would indicate the subjective importance we wish to place on each prediction [2].
For instance, considering the release date, we can assign weights in such a way that the higher the weight, the higher importance we are placing on more recent data. In this case we could observe that even when the MAE is under reasonable threshold, the performance of a system might be inadequate when analyzing this particular feature.
This is how a favor of more recent data skews the results.
[1] http://en.wikipedia.org/wiki/Mean_absolute_percentage_error
[2] http://ir.ii.uam.es/rue2012/papers/rue2012-cleger-tamayo.pdf
回答2:
There is also another error metric:
WAPE = 100/n * Sum(|(A(t) - P(t)|)/sum(A(t)), for t in 1..n
where A(t) is the actual value at time t, P(t) is the predicted value at time t.
It is not sensitive to big distortions.
来源:https://stackoverflow.com/questions/12994929/whats-the-gaps-for-the-forecast-error-metrics-mape-and-wmape