I came up with following issue when I try to extract the predicted probabilities using support vector machine (SVM
). Usually the probability cutoff for a classi
As noted in the comments by desertnaut, SVMs are not probabilistic classifiers; they do not actually produce probabilities.
One method to create probabilities is to directly train a kernel classifier with a logit link function and a regularized maximum likelihood score. However, training with a maximum likelihood score will produce non-sparse kernel machines. Instead, after training a SVM, parameters of an additional sigmoid function are trained to map the SVM outputs into probabilities. Reference paper: Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods
Caret method = "svmRadialSigma"
uses internally kernlab::ksvm with the argument kernel = "rbfdot"
. In order for this function to create probabilities the argument prob.model = TRUE
is needed. From the help of this function:
prob.model if set to TRUE builds a model for calculating class probabilities or in case of regression, calculates the scaling parameter of the Laplacian distribution fitted on the residuals. Fitting is done on output data created by performing a 3-fold cross-validation on the training data. For details see references. (default: FALSE)
The refereed details:
In classification when prob.model is TRUE a 3-fold cross validation is performed on the data and a sigmoid function is fitted on the resulting decision values f.
It is clear that something very specific is happening for classification models when posterior probabilities are needed. This is different compared to just outputting decision values.
From this it can be derived that depending on the sigmoid function fit some of the
decision values can be different compared to when running [kernlab::ksvm
] without prob.model
(prob.model = FALSE
) and this is what you are observing in the posted example.
Things get even more complicated if there are more then two classes.
Further reading:
Including class probabilities might skew a model in caret?
Isn't caret SVM classification wrong when class probabilities are included?
Why are probabilities and response in ksvm in R not consistent?
[R] Inconsistent results between caret+kernlab versions