Why does my CIFAR 100 CNN model mainly predict two classes?

前端 未结 4 1104
不思量自难忘°
不思量自难忘° 2021-01-17 00:06

I am currently trying to get a decent score (> 40% accuracy) with Keras on CIFAR 100. However, I\'m experiencing a weird behaviour of a CNN model: It tends to predict some c

相关标签:
4条回答
  • 2021-01-17 00:47

    I don't have a good feeling with this part of the code:

    model.add(Dense(1024))
    model.add(Activation('tanh'))
    model.add(Dropout(0.5))
    model.add(Dense(nb_classes))
    model.add(Activation('softmax'))
    

    The remaining model is full of relus, but here there is a tanh.

    tanh sometimes vanishes or explodes (saturates at -1 and 1), which might lead to your 2-class overimportance.

    keras-example cifar 10 basically uses the same architecture (dense-layer sizes might be different), but also uses a relu there (no tanh at all). The same goes for this external keras-based cifar 100 code.

    0 讨论(0)
  • 2021-01-17 00:49

    If you get good accuracy during training and validation, but not when testing, make sure you do exactly the same preprocessing on your dataset in both cases. Here you have when training:

    X_train /= 255
    X_val /= 255
    X_test /= 255
    

    But no such code when predicting for your confusion matrix. Adding to testing:

    X_val /=  255.
    

    Gives the following nice looking confusion matrix:

    0 讨论(0)
  • 2021-01-17 01:03
    1. I don't see you doing mean-centering, even in datagen. I suspect this is the main cause. To do mean centering using ImageDataGenerator, set featurewise_center = 1. Another way is to subtract the ImageNet mean from each RGB pixel. The mean vector to be subtracted is [103.939, 116.779, 123.68].

    2. Make all activations relus, unless you have a specific reason to have a single tanh.

    3. Remove two dropouts of 0.25 and see what happens. If you want to apply dropouts to convolution layer, it is better to use SpatialDropout2D. It is somehow removed from Keras online documentation but you can find it in the source.

    4. You have two conv layers with same and two with valid. There is nothing wrong in this, but it would be simpler to keep all conv layers with same and control your size just based on max-poolings.

    0 讨论(0)
  • 2021-01-17 01:05

    One important part of the problem was that my ~/.keras/keras.json was

    {
        "image_dim_ordering": "th",
        "epsilon": 1e-07,
        "floatx": "float32",
        "backend": "tensorflow"
    }
    

    Hence I had to change image_dim_ordering to tf. This leads to

    and an accuracy of 12.73%. Obviously, there is still a problem as the validation history gave 45.1% accuracy.

    0 讨论(0)
提交回复
热议问题