As using batchnormalization in my neural network model, is tuning the momentum make a really huge difference? As from what I know, smaller batch_size should use higher value for