问题
I'm training a ResNet (CIFAR-10 dataset) and train accuracy is mostly (in 95% epochs) increasing, but sometimes it drops 5-10% and then it starts increasing again.
Here is an example:
Epoch 45/100
40000/40000 [==============================] - 50s 1ms/step - loss: 0.0323 - acc: 0.9948 - val_loss: 1.6562 - val_acc: 0.7404
Epoch 46/100
40000/40000 [==============================] - 52s 1ms/step - loss: 0.0371 - acc: 0.9932 - val_loss: 1.6526 - val_acc: 0.7448
Epoch 47/100
40000/40000 [==============================] - 50s 1ms/step - loss: 0.0266 - acc: 0.9955 - val_loss: 1.6925 - val_acc: 0.7426
Epoch 48/100
40000/40000 [==============================] - 50s 1ms/step - loss: 0.0353 - acc: 0.9940 - val_loss: 2.2682 - val_acc: 0.6496
Epoch 49/100
40000/40000 [==============================] - 50s 1ms/step - loss: 1.6391 - acc: 0.4862 - val_loss: 1.2524 - val_acc: 0.5659
Epoch 50/100
40000/40000 [==============================] - 52s 1ms/step - loss: 0.9220 - acc: 0.6830 - val_loss: 0.9726 - val_acc: 0.6738
Epoch 51/100
40000/40000 [==============================] - 51s 1ms/step - loss: 0.5453 - acc: 0.8165 - val_loss: 1.0232 - val_acc: 0.6963
I've quit execution after this, but this was my second run and in first, same thing happened and after some time it got back to 99%.
Batch is 128 so I guess this is not a problem. I haven't change learning rate or any other Adam parameters, but I guess that's also not an issue since accuracy is increasing most of the time.
So, why are those sudden drops happening?
回答1:
Since training and validation loss and accuracy all increase it looks like your optimization algorithm has temporarily overshot the downhill part of the loss function that it was trying to follow.
Remember gradient descent and related methods calculate the gradient at a point and then use that (and sometimes some additional data) to guess the direction and distance to move. This is not always perfect and sometimes it will go too far and end up further uphill again.
If your learning rate is aggressive you will see this every now and then, but you might still converge faster than with a smaller learning rate. You can experiment with different learning rates, but I would not be concerned unless your loss starts to diverge.
来源:https://stackoverflow.com/questions/47465006/train-accuracy-drops-in-some-epochs