According to the official libsvm documentation (Section 7):
LIBSVM implements the "one-against-one" approach for multi-class
classification. If k
is the number of classes, then k(k-1)/2
classifiers are constructed and each one trains data from two
classes.
In classification we use a voting strategy: each binary
classification is considered to be a voting where votes can be cast
for all data points x - in the end a point is designated to be in a
class with the maximum number of votes.
In the one-against-all approach, we build as many binary classifiers as there are classes, each trained to separate one class from the rest. To predict a new instance, we choose the classifier with the largest decision function value.
As I mentioned before, the idea is to train k
SVM models each one separating one class from the rest. Once we have those binary classifiers, we use the probability outputs (the -b 1
option) to predict new instances by picking the class with the highest probability.
Consider the following example:
%# Fisher Iris dataset
load fisheriris
[~,~,labels] = unique(species); %# labels: 1/2/3
data = zscore(meas); %# scale features
numInst = size(data,1);
numLabels = max(labels);
%# split training/testing
idx = randperm(numInst);
numTrain = 100; numTest = numInst - numTrain;
trainData = data(idx(1:numTrain),:); testData = data(idx(numTrain+1:end),:);
trainLabel = labels(idx(1:numTrain)); testLabel = labels(idx(numTrain+1:end));
Here is my implementation for the one-against-all approach for multi-class SVM:
%# train one-against-all models
model = cell(numLabels,1);
for k=1:numLabels
model{k} = svmtrain(double(trainLabel==k), trainData, '-c 1 -g 0.2 -b 1');
end
%# get probability estimates of test instances using each model
prob = zeros(numTest,numLabels);
for k=1:numLabels
[~,~,p] = svmpredict(double(testLabel==k), testData, model{k}, '-b 1');
prob(:,k) = p(:,model{k}.Label==1); %# probability of class==k
end
%# predict the class with the highest probability
[~,pred] = max(prob,[],2);
acc = sum(pred == testLabel) ./ numel(testLabel) %# accuracy
C = confusionmat(testLabel, pred) %# confusion matrix