batch-normalization

batch normalization, yes or no?

淺唱寂寞╮ 提交于 2019-12-04 23:49:43
问题 I use Tensorflow 1.14.0 and Keras 2.2.4. The following code implements a simple neural network: import numpy as np np.random.seed(1) import random random.seed(2) import tensorflow as tf tf.set_random_seed(3) from tensorflow.keras.models import Model, Sequential from tensorflow.keras.layers import Input, Dense, Activation x_train=np.random.normal(0,1,(100,12)) model = Sequential() model.add(Dense(8, input_shape=(12,))) # model.add(tf.keras.layers.BatchNormalization()) model.add(Activation(

Batch Normalization doesn't have gradient in tensorflow 2.0?

依然范特西╮ 提交于 2019-12-04 12:42:00
I am trying to make a simple GANs to generate digits from the MNIST dataset. However when I get to training(which is custom) I get this annoying warning that I suspect is the cause of not training like I'm used to. Keep in mind this is all in tensorflow 2.0 using it's default eager execution. GET THE DATA(not that important) (train_images,train_labels),(test_images,test_labels) = tf.keras.datasets.mnist.load_data() train_images = train_images.reshape(train_images.shape[0], 28, 28, 1).astype('float32') train_images = (train_images - 127.5) / 127.5 # Normalize the images to [-1, 1] BUFFER_SIZE =

Tensorflow and Batch Normalization with Batch Size==1 => Outputs all zeros

泪湿孤枕 提交于 2019-12-04 08:45:03
问题 I have a question about the understanding of the BatchNorm (BN later on). I have a convnet working nicely, I was writing tests to check for shape and outputs range. And I noticed that when I set the batch_size = 1, my model outputs zeros (logits and activations). I prototyped the simplest convnet with BN: Input => Conv + ReLU => BN => Conv + ReLU => BN => Conv Layer + Tanh The model is initialized with xavier initialization . I guess that BN during training do some calculations that require

Tensorflow batch_norm does not work properly when testing (is_training=False)

て烟熏妆下的殇ゞ 提交于 2019-12-03 13:33:38
I am training the following model: with slim.arg_scope(inception_arg_scope(is_training=True)): logits_v, endpoints_v = inception_v3(all_v, num_classes=25, is_training=True, dropout_keep_prob=0.8, spatial_squeeze=True, reuse=reuse_variables, scope='vis') logits_p, endpoints_p = inception_v3(all_p, num_classes=25, is_training=True, dropout_keep_prob=0.8, spatial_squeeze=True, reuse=reuse_variables, scope='pol') pol_features = endpoints_p['pol/features'] vis_features = endpoints_v['vis/features'] eps = 1e-08 loss = tf.sqrt(tf.maximum(tf.reduce_sum(tf.square(pol_features - vis_features), axis=1,

Ways to implement multi-GPU BN layers with synchronizing means and vars

半腔热情 提交于 2019-12-03 12:11:10
问题 I'd like to know the possible ways to implement batch normalization layers with synchronizing batch statistics when training with multi-GPU. Caffe Maybe there are some variants of caffe that could do, like link. But for BN layer, my understanding is that it still synchronizes only the outputs of layers, not the means and vars. Maybe MPI can synchronizes means and vars but I think MPI is a little difficult to implemnt. Torch I've seen some comments here and here, which show the running_mean

Batch normalization instead of input normalization

北战南征 提交于 2019-12-03 05:09:32
问题 Can I use batch normalization layer right after input layer and not normalize my data? May I expect to get similar effect/performance? In keras functional it would be something like this: x = Input (...) x = Batchnorm(...)(x) ... 回答1: You can do it. But the nice thing about batchnorm, in addition to activation distribution stabilization, is that the mean and std deviation are likely migrate as the network learns. Effectively, setting the batchnorm right after the input layer is a fancy data

Ways to implement multi-GPU BN layers with synchronizing means and vars

南笙酒味 提交于 2019-12-03 02:41:20
I'd like to know the possible ways to implement batch normalization layers with synchronizing batch statistics when training with multi-GPU. Caffe Maybe there are some variants of caffe that could do, like link . But for BN layer, my understanding is that it still synchronizes only the outputs of layers, not the means and vars. Maybe MPI can synchronizes means and vars but I think MPI is a little difficult to implemnt. Torch I've seen some comments here and here , which show the running_mean and running_var can be synchronized but I think batch mean and batch var can not or are difficult to

Instance Normalisation vs Batch normalisation

Deadly 提交于 2019-12-03 00:29:32
问题 I understand that Batch Normalisation helps in faster training by turning the activation towards unit Gaussian distribution and thus tackling vanishing gradients problem. Batch norm acts is applied differently at training(use mean/var from each batch) and test time (use finalized running mean/var from training phase). Instance normalisation, on the other hand, acts as contrast normalisation as mentioned in this paper https://arxiv.org/abs/1607.08022 . The authors mention that the output

Batch normalization instead of input normalization

这一生的挚爱 提交于 2019-12-02 17:35:52
Can I use batch normalization layer right after input layer and not normalize my data? May I expect to get similar effect/performance? In keras functional it would be something like this: x = Input (...) x = Batchnorm(...)(x) ... Maxim You can do it. But the nice thing about batchnorm, in addition to activation distribution stabilization, is that the mean and std deviation are likely migrate as the network learns. Effectively, setting the batchnorm right after the input layer is a fancy data pre-processing step. It helps, sometimes a lot (e.g. in linear regression). But it's easier and more

Instance Normalisation vs Batch normalisation

好久不见. 提交于 2019-12-02 14:06:46
I understand that Batch Normalisation helps in faster training by turning the activation towards unit Gaussian distribution and thus tackling vanishing gradients problem. Batch norm acts is applied differently at training(use mean/var from each batch) and test time (use finalized running mean/var from training phase). Instance normalisation, on the other hand, acts as contrast normalisation as mentioned in this paper https://arxiv.org/abs/1607.08022 . The authors mention that the output stylised images should be not depend on the contrast of the input content image and hence Instance