Batch normalization instead of input normalization

这一生的挚爱 提交于 2019-12-02 17:35:52
Maxim

You can do it. But the nice thing about batchnorm, in addition to activation distribution stabilization, is that the mean and std deviation are likely migrate as the network learns.

Effectively, setting the batchnorm right after the input layer is a fancy data pre-processing step. It helps, sometimes a lot (e.g. in linear regression). But it's easier and more efficient to compute the mean and variance of the whole training sample once, than learn it per-batch. Note that batchnorm isn't free in terms of performance and you shouldn't abuse it.


易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!