I am using SKLearn to run SVC on my data.
from sklearn import svm
svc = svm.SVC(kernel='linear', C=C).fit(X, y)
I want to know how I can get the distance of each data point in X from the decision boundary?
For linear kernel, the decision boundary is y = w * x + b, the distance from point x to the decision boundary is y/||w||.
y = svc.decision_function(x)
w_norm = np.linalg.norm(svc.coef_)
dist = y / w_norm
For non-linear kernels, there is no way to get the absolute distance. But you can still use the result of decision_funcion
as relative distance.
It happens to be that I am doing the homework 1 of a course named Machine Learning Techniques. And there happens to be a problem about point's distance to hyperplane even for RBF kernel.
First we know that SVM is to find an "optimal" w for a hyperplane wx + b = 0.
And the fact is that
w = \sum_{i} \alpha_i \phi(x_i)
where those x are so called support vectors and those alpha are coefficient of them. Note that there is a phi() outside the x; it is the transform function that transform x to some high dimension space (for RBF, it is infinite dimension). And we know that
[\phi(x_1)\phi(x_2) = K(x_1, x_2)][2]
then we can get w. So, the distance you want should be
svc.decision_function(x) / w_norm
where w_norm the the norm calculated above.
(StackOverflow doesn't allow me post more than 2 links so render the latex yourself bah.)
来源:https://stackoverflow.com/questions/32074239/sklearn-getting-distance-of-each-point-from-decision-boundary