Poor Result with BatchNormalization

问题

I have been trying to implement the DCGan, the face book's paper, and blocked by below two issues almost for 2 weeks. Any suggestions would be appreciated. Thanks.

Issue 1:

DCGAN paper suggest to use BN(Batch Normalization) both the generator and discriminator. But, I couldn't get better result with BN rather than w/out BN.

I copied DCGAN model I used which is exactly the same with a DCGAN paper. I don't think it is due to overfitting. Because (1) It keeps showing the noise the same with an initial noise picture and seems never been trained. (2) The Loss value is very stable that gan and discriminator both are not really changed. (It's staying about 0.6 ~ 0.7 and never felt down or bumped up like when both models are collapsed.) If I check only loss function, seems like it is getting trained well.

Issue 2:

When I used float16, it always gives me Nan with the model below. I have changed epsilon as 1e-4 1e-3 both but failed. And here is one more question. If I don't use the BatchNormalization, it can be Nan. it enough makes sense, I can get it. But, if I use BatchNormalization, it normalizes in every layer. Even if the result becomes very big number or very small number it will be batch normalized in every layer that the result will be almost centered and the fade-out shouldn't happen. isn't it? it's actually my thought but I don't know what I am thinking wrong.. please, somebody, help me.

===== Generator =====

Input # (None, 128) <= latent

Dense # (None, 16384)
BatchNormalization
LeakyReLU

Reshape # (None, 4, 4, 1024)

Conv2DTranspose # (None, 4, 4, 512)

BatchNormalization
LeakyReLU

Conv2DTranspose # (None, 8, 8, 256)

BatchNormalization
LeakyReLU

Conv2DTranspose # (None, 16, 16, 128)

BatchNormalization
LeakyReLU

Conv2DTranspose # (None, 32, 32, 64)

BatchNormalization
LeakyReLU

Conv2DTranspose # (None, 64, 64, 32)

BatchNormalization
LeakyReLU

Conv2DTranspose # (None, 128, 128, 16)

BatchNormalization
LeakyReLU

Conv2D # (None, 128, 128, 3)

===== Discriminator =====

Conv2D # (None, 128, 128, 3) LeakyReLU

Conv2D # (None, 64, 64, 16) BatchNormalization
Dropout
LeakyReLU

Conv2D # (None, 32, 32, 32)
BatchNormalization
Dropout
LeakyReLU

Conv2D # (None, 16, 16, 64)
BatchNormalization
Dropout
LeakyReLU

Conv2D # (None, 8, 8, 128)
BatchNormalization
Dropout
LeakyReLU

Conv2D # (None, 4, 4, 256)
BatchNormalization
Dropout
LeakyReLU

Conv2D # (None, 2, 2, 512)
BatchNormalization
Dropout
LeakyReLU

Flatten
Dropout
Dense

and the last hyperparameters I have tried are as below and I didn't forget to add the gaussian noise to training pictures.

image_shape => (128, 128, 3)
latent_dim => 128
channels => 3
iterations => 10000
batch_size => 128
epsilon => 0.005
weight_init_stddev => 0.02
beta_1 => 0.5
discriminator_lr => 0.0002
gan_lr => 0.0002

回答1:

I don't know details of DCGAN thesis, but if I look into it I can find the below guidelines to make stable DCGAN. Why did you use LeakyReLU in Generator instead of ReLU?

Architecture guidelines for stable Deep Convolutional GANs

Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).

Use batchnorm in both the generator and the discriminator.

Remove fully connected hidden layers for deeper architectures.

Use ReLU activation in generator for all layers except for the output, which uses Tanh.

Use LeakyReLU activation in the discriminator for all layers

回答2:

As we know batch normalization's learning pseudo code is

moving_mean = None;
moving_variance = None;

if not moving_mean:
  moving_mean = current_batch_mean
else:
  moving_mean = moving_mean * momentum + current_batch_mean * (1-momentum)

if not moving_variance:
  moving_variance = current_batch_variance
else:
  moving_variance = moving_variance * momentum + current_batch_variance * (1-momentum)

and here's the point.

tensorflow and keras default momentum is 0.99 that if you use it without modification, the following update value doesn't affect the new updated value. in the case of pytorch, the default momentum is 0.1 which is same with 0.9 in tensorflow or keras.

With the modified momentum value, I got improved result.

if someone has suffered the simptom like me, try to reduce the momentum value.

Thanks.