As using batchnormalization as a layer in my neural network model, is tuning the momentum make a really huge difference? As from what I know, smaller batch_size should use highe