I\'ve modified the Caffe MNIST example to classify 3 classes of image. One thing I noticed was that if I specify the number of output layers as 3, then my test accuracy drop
It is important to notice that when fine-tuning and/or changing the number of labels the input labels must always start from 0, as they are used as indices into the output probability vector when computing the loss.
Thus, if you have
inner_product_params {
num_output: 3
}
You must have training labels 0,1 and 2 only.
If you use num_output: 3
with labels 1,2,3 caffe is unable to represent label 3 and in fact has a redundant line corresponding to label 0 that is left unused.
As you observed, when changing to num_output: 4
caffe is again able to represent label 3 and the results improved, but still you have an unused row in the parameters matrix.