Pytorch Batchnorm layer different from Keras Batchnorm
问题 I'm trying to copy pre-trained BN weights from a pytorch model to its equivalent Keras model but I keep getting different outputs. I read Keras and Pytorch BN documentation and I think that the difference lies in the way they calculate the "mean" and "var". Pytorch: The mean and standard-deviation are calculated per-dimension over the mini-batches source: Pytorch BatchNorm Thus, they average over samples. Keras: axis: Integer, the axis that should be normalized (typically the features axis).