问题
currently i work on calibration of probability. i use the calibration approach, called rescaling algorithm
- the source http://lem.cnrs.fr/Portals/2/actus/DP_201106.pdf (page 7).
the algorithm i wrote is:
rescaling_fun = function(x, y, z) {
P_korg = z # yhat_test_prob$BAD
P_k_C1 = sum(as.numeric(y) - 1)/length(y) # testset$BAD
P_kt_C1 = sum(as.numeric(x) - 1)/length(x) # trainset$BAD
P_k_C0 = sum(abs(as.numeric(y) - 2))/length(y)
P_kt_C0 = sum(abs(as.numeric(x) - 2))/length(x)
P_new <- ((P_k_C1/P_kt_C1) * P_korg)/((P_k_C0/P_k_C0) * (1 - P_korg) + (P_k_C0/P_k_C1) * (P_korg))
return(P_new)
}
the input values are:
1. x - train_set$BAD (actuals of `train set`)
2. y - test_set$BAD (actuals of `test set`)
3. z - yhat_test_prob$BAD (prediction on `test set`)
the problem - the result values are not within range of 0
and 1
. Could you please help to solve the problem?
回答1:
Your formulas to obtain probs (P_k_C1
...) need to be modified. For example, according to the paper, y is a binary variable (0, 1) and the formula is sum(y - 1)/length(y)
which is most likely to be negative - it converts y values to be -1 or 0, followed by adding them. I consider it should be (sum(y)-1)/length(y)
. Below is an example.
set.seed(1237)
y <- sample(0:1, 10, replace = T)
y
[1] 0 1 0 0 0 1 1 0 1 1
# it must be negative as it is sum(y - 1) - y is 0 or 1
sum(as.numeric(y) - 1)/length(y)
[1] -0.5
# modification
(sum(as.numeric(y)) - 1)/length(y)
[1] 0.4
来源:https://stackoverflow.com/questions/29948919/calibration-of-the-posterior-probabilities