问题
I have multi class labels and want to compute the accuracy of my model.
I am kind of confused on which sklearn function I need to use.
As far as I understood the below code is only used for the binary classification.
# dividing X, y into train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,random_state = 0)
# training a linear SVM classifier
from sklearn.svm import SVC
svm_model_linear = SVC(kernel = 'linear', C = 1).fit(X_train, y_train)
svm_predictions = svm_model_linear.predict(X_test)
# model accuracy for X_test
accuracy = svm_model_linear.score(X_test, y_test)
print accuracy
and as I understood from the link: Which decision_function_shape for sklearn.svm.SVC when using OneVsRestClassifier?
for multiclass classification I should use OneVsRestClassifier
with decision_function_shape (with ovr
or ovo
and check which one works better)
svm_model_linear = OneVsRestClassifier(SVC(kernel = 'linear',C = 1, decision_function_shape = 'ovr')).fit(X_train, y_train)
The main problem is that the time of predicting the labels does matter to me but it takes about 1 minute to run the classifier and predict the data (also this time is added to the feature reduction such as PCA which also takes sometime)? any suggestions to reduce the time for svm multiclassifer?
回答1:
There are multiple things to consider here:
1) You see, OneVsRestClassifier
will separate out all labels and train multiple svm objects (one for each label) on the given data. So each time, only binary data will be supplied to single svm object.
2) SVC internally uses libsvm
and liblinear
, which have a 'OvO' strategy for multi-class or multi-label output. But this point will be of no use because of point 1. libsvm
will only get binary data.
Even if it did, it doesnt take into account the 'decision_function_shape'
. So it does not matter if you provide decision_function_shape = 'ovr'
or decision_function_shape = 'ovr'
.
So it seems that you are looking at the problem wrong. decision_function_shape
should not affect the speed. Try standardizing your data before fitting. SVMs work well with standardized data.
回答2:
When wrapping models with the ovr
or ovc
classifiers, you could set the n_jobs
parameters to make them run faster, e.g. sklearn.multiclass.OneVsOneClassifier(estimator, n_jobs=-1)
or sklearn.multiclass.OneVsRestClassifier(estimator, n_jobs=-1)
.
Although each single SVM classifier in sklearn could only use one CPU core at a time, the ensemble multi class classifier could fit multiple models at the same time by setting n_jobs
.
来源:https://stackoverflow.com/questions/49848453/sklearn-multiclass-svm-function