I\'m studying a scikit-learn example (Classifier comparison) and got confused with predict_proba
and decision_function
.
They plot the classific
Your example is
if hasattr(clf, "decision_function"):
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
else:
Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
so the code uses decision_function
if it exists. On the SVM case, predict_proba
is computed (in the binary case)
using Platt scaling
which is both "expensive" and has "theoretical issues". That's why decision_function
is used here. (as @Ami said, this is the margin -
the distance to the hyperplane, which is accessible without much further computation). In the SVM case, it is advised to
use
decision_function
instead ofpredict_proba
.
There are other decision_function
s: SGDClassifier's. Here, predict_proba
depends on the loss function, and decision_function
is universally available.
The latter, predict_proba is a method of a (soft) classifier outputting the probability of the instance being in each of the classes.
The former, decision_function, finds the distance to the separating hyperplane. For example, a(n) SVM classifier finds hyperplanes separating the space into areas associated with classification outcomes. This function, given a point, finds the distance to the separators.
I'd guess that predict_prob
is more useful in your case, in general - the other method is more specific to the algorithm.