Getting probability of each new observation being an outlier when using scikit-learn OneClassSVM

问题

I'm new to scikit-learn, and SVM methods in general. I've got my data set working well with scikit-learn OneClassSVM in order to detect outliers; I train the OneClassSVM using observation all of which are 'inliers' and then use predict() to generate binary inlier/outlier predictions on my testing set of data.

However to continue further with my analysis I'd like to get the probabilities associated with each new observation in my test set. E.g. The probability of being an outlier associated with each new observation. I've noticed other classification methods in scikit-learn offer the ability to pass the parameter probability=True to compute this, but OneClassSVM does not offer this. Is there an easy way to get these results?

回答1:

I was searching for an answer for the same question of yours until I got to this page. Stuck for sometime, then, I went back to check the original LIBSVM package since OneClassSVM of scikit-learn is based on the implementation of LIBSVM as stated here.

At the main page of LIBSVM, they state the following for option '-b' that is used to activate returning probability output scores for some variants of SVM: -b probability_estimates: whether to train a SVC or SVR model for probability estimates, 0 or 1 (default 0) In other words, the one-class SVM which is of type SVM (neither SVC nor SVR) does not have implementation for probability estimation.

If I go and try to force this option (i.e. -b) using the command line interface of LIBSVM, for example: ./svm-train -s 2 -t 2 -b 1 heart_scale

I receive the following error message: ERROR: one-class SVM probability output not supported yet

In summary, this very desired output is not yet supported by LIBSVM and thus, scikit-learn is not offering it for the moment. I hope in near future, they activate this functionality and update the thread here.

来源：https://stackoverflow.com/questions/28390479/getting-probability-of-each-new-observation-being-an-outlier-when-using-scikit-l

标签

scikit-learn

svm