Batch normalization when batch size=1
问题 What will happen when I use batch normalization but set batch_size = 1 ? Because I am using 3D medical images as training dataset, the batch size can only be set to 1 because of GPU limitation. Normally, I know, when batch_size = 1 , variance will be 0. And (x-mean)/variance will lead to error because of division by 0. But why did errors not occur when I set batch_size = 1 ? Why my network was trained as good as I expected? Could anyone explain it? Some people argued that: The