sklearn multiclass svm function

问题

I have multi class labels and want to compute the accuracy of my model.
I am kind of confused on which sklearn function I need to use. As far as I understood the below code is only used for the binary classification.

# dividing X, y into train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y,  test_size=0.25,random_state = 0)

# training a linear SVM classifier
from sklearn.svm import SVC
svm_model_linear = SVC(kernel = 'linear', C = 1).fit(X_train, y_train)
svm_predictions = svm_model_linear.predict(X_test)

# model accuracy for X_test  
accuracy = svm_model_linear.score(X_test, y_test)
print accuracy

and as I understood from the link: Which decision_function_shape for sklearn.svm.SVC when using OneVsRestClassifier?

for multiclass classification I should use OneVsRestClassifier with decision_function_shape (with ovr or ovo and check which one works better)

svm_model_linear = OneVsRestClassifier(SVC(kernel = 'linear',C = 1, decision_function_shape = 'ovr')).fit(X_train, y_train)

The main problem is that the time of predicting the labels does matter to me but it takes about 1 minute to run the classifier and predict the data (also this time is added to the feature reduction such as PCA which also takes sometime)? any suggestions to reduce the time for svm multiclassifer?

回答1:

There are multiple things to consider here:

1) You see, OneVsRestClassifier will separate out all labels and train multiple svm objects (one for each label) on the given data. So each time, only binary data will be supplied to single svm object.

2) SVC internally uses libsvm and liblinear, which have a 'OvO' strategy for multi-class or multi-label output. But this point will be of no use because of point 1. libsvm will only get binary data.

Even if it did, it doesnt take into account the 'decision_function_shape'. So it does not matter if you provide decision_function_shape = 'ovr' or decision_function_shape = 'ovr'.

So it seems that you are looking at the problem wrong. decision_function_shape should not affect the speed. Try standardizing your data before fitting. SVMs work well with standardized data.

回答2:

When wrapping models with the ovr or ovc classifiers, you could set the n_jobs parameters to make them run faster, e.g. sklearn.multiclass.OneVsOneClassifier(estimator, n_jobs=-1) or sklearn.multiclass.OneVsRestClassifier(estimator, n_jobs=-1).

Although each single SVM classifier in sklearn could only use one CPU core at a time, the ensemble multi class classifier could fit multiple models at the same time by setting n_jobs.

来源：https://stackoverflow.com/questions/49848453/sklearn-multiclass-svm-function

标签

machine-learning

scikit-learn

classification

svm

pca