Reducing (Versus Delaying) Overfitting in Neural Network

别来无恙 提交于 2019-12-14 04:04:15

问题


In neural nets, regularization (e.g. L2, dropout) is commonly used to reduce overfitting. For example, the plot below shows typical loss vs epoch, with and without dropout. Solid lines = Train, dashed = Validation, blue = baseline (no dropout), orange = with dropout. Plot courtesy of Tensorflow tutorials. Weight regularization behaves similarly.

Regularization delays the epoch at which validation loss starts to increase, but regularization apparently does not decrease the minimum value of validation loss (at least in my models and the tutorial from which the above plot is taken).

If we use early stopping to stop training when validation loss is minimum (to avoid overfitting) and if regularization is only delaying the minimum validation loss point (vs. decreasing the minimum validation loss value) then it seems that regularization does not result in a network with greater generalization but rather just slows down training.

How can regularization be used to reduce the minimum validation loss (to improve model generalization) as opposed to just delaying it? If regularization is only delaying minimum validation loss and not reducing it, then why use it?


回答1:


Over-generalizing from a single tutorial plot is arguably not a good idea; here is a relevant plot from the original dropout paper:

Clearly, if the effect of dropout was to delay convergence it would not be of much use. But of course it does not work always (as your plot clearly suggests), hence it should not be used by default (which is arguably the lesson here)...



来源:https://stackoverflow.com/questions/57894274/reducing-versus-delaying-overfitting-in-neural-network

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!