Relationship between loss and accuracy

前端 未结 4 1132
失恋的感觉
失恋的感觉 2020-12-29 12:19

Is it practically possible to have decreasing loss and decreasing accuracy at each epoch when training a CNN model? I am getting the below result while training.

4条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-12-29 13:18

    There are at least 5 reasons which might cause such behavior:

    1. Outliers: imagine that you have 10 exactly the same images and 9 out of them belong to class A and one belongs to class B. In this case, a model will start to assign a high probability of class A to this example because of the majority of examples. But then - a signal from outlier might destabilize model and make accuracy decreasing. In theory, a model should stabilize at assigning score 90% to class A but it might last many epochs.

      Solutions: In order to deal with such examples I advise you to use gradient clipping (you may add such option in your optimizer). If you want to check if this phenomenon occurs - you may check your losses distributions (losses of individual examples from training set) and look for outliers.

    2. Bias: Now imagine that you have 10 exactly the same images but 5 of them have assigned class A and 5 - class B. In this case, a model will try to assign approximately 50%-50% distribution on both of these classes. Now - your model can achieve at most 50% of accuracy here - choosing one class out of two valid.

      Solution: Try to increase the model capacity - very often you have a set of really similar images - adding expressive power might help to discriminate similar examples. Beware of overfitting though. Another solution is to try this strategy in your training. If you want to check if such phenomenon occurs - check the distribution of losses of individual examples. If a distribution would be skewed toward higher values - you are probably suffering from bias.

    3. Class inbalance: Now imagine that 90% of your images belong to class A. In an early stage of your training, your model is mainly concentrating on assigning this class to almost all of examples. This might make individual losses to achieve really high values and destabilize your model by making a predicted distribution more unstable.

      Solution: once again - gradient clipping. Second thing - patience, try simply leaving your model for more epochs. A model should learn more subtle in a further phase of training. And of course - try class balancing - by either assigning sample_weights or class_weights. If you want to check if this phenomenon occurs - check your class distribution.

    4. Too strong regularization: if you set your regularization to be too strict - a training process is mainly concentrated on making your weights to have smaller norm than actually learning interesting insights.

      Solution: add a categorical_crossentropy as a metric and observe if it's also decreasing. If not - then it means that your regularization is too strict - try to assign less weight penalty then.

    5. Bad model design - such behavior might be caused by a wrong model design. There are several good practices which one might apply in order to improve your model:

      Batch Normalization - thanks to this technique you are preventing your model from radical changes of inner network activations. This makes training much more stable and efficient. With a small batch size, this might be also a genuine way of regularizing your model.

      Gradient clipping - this makes your model training much more stable and efficient.

      Reduce bottleneck effect - read this fantastic paper and check if your model might suffer from bottleneck problem.

      Add auxiliary classifiers - if you are training your network from scratch - this should make your features much more meaningful and your training - faster and more efficient.

提交回复
热议问题