I understand that Batch Normalisation helps in faster training by turning the activation towards unit Gaussian distribution and thus tackling vanishing gradients problem. Batch
I wanted to add more information to this question since there are some more recent works in this area. Your intuition
use instance normalisation for image classification where class label should not depend on the contrast of input image
is partly correct. I would say that a pig in broad daylight is still a pig when the image is taken at night or at dawn. However, this does not mean using instance normalization across the network will give you better result. Here are some reasons:
IBN-Net uses both batch normalization and instance normalization in their model. They only put instance normalization in early layers and have achieved improvement in both accuracy and ability to generalize. They have open sourced code here.