I am currently trying to get a decent score (> 40% accuracy) with Keras on CIFAR 100. However, I\'m experiencing a weird behaviour of a CNN model: It tends to predict some c
I don't have a good feeling with this part of the code:
model.add(Dense(1024))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
The remaining model is full of relus
, but here there is a tanh
.
tanh
sometimes vanishes or explodes (saturates at -1 and 1), which might lead to your 2-class overimportance.
keras-example cifar 10 basically uses the same architecture (dense-layer sizes might be different), but also uses a relu
there (no tanh at all). The same goes for this external keras-based cifar 100 code.
If you get good accuracy during training and validation, but not when testing, make sure you do exactly the same preprocessing on your dataset in both cases. Here you have when training:
X_train /= 255
X_val /= 255
X_test /= 255
But no such code when predicting for your confusion matrix. Adding to testing:
X_val /= 255.
Gives the following nice looking confusion matrix:
I don't see you doing mean-centering, even in datagen. I suspect this is the main cause. To do mean centering using ImageDataGenerator
, set featurewise_center = 1
. Another way is to subtract the ImageNet mean from each RGB pixel. The mean vector to be subtracted is [103.939, 116.779, 123.68]
.
Make all activations relu
s, unless you have a specific reason to have a single tanh
.
Remove two dropouts of 0.25 and see what happens. If you want to apply dropouts to convolution layer, it is better to use SpatialDropout2D
. It is somehow removed from Keras online documentation but you can find it in the source.
You have two conv
layers with same
and two with valid
. There is nothing wrong in this, but it would be simpler to keep all conv
layers with same
and control your size just based on max-poolings.
One important part of the problem was that my ~/.keras/keras.json
was
{
"image_dim_ordering": "th",
"epsilon": 1e-07,
"floatx": "float32",
"backend": "tensorflow"
}
Hence I had to change image_dim_ordering
to tf
. This leads to
and an accuracy of 12.73%. Obviously, there is still a problem as the validation history gave 45.1% accuracy.