batch-normalization

BatchNorm momentum convention PyTorch

自闭症网瘾萝莉.ら 提交于 2020-07-17 05:46:04
问题 Is the batchnorm momentum convention (default=0.1) correct as in other libraries e.g. Tensorflow it seems to usually be 0.9 or 0.99 by default? Or maybe we are just using a different convention? 回答1: It seems that the parametrization convention is different in pytorch than in tensorflow, so that 0.1 in pytorch is equivalent to 0.9 in tensorflow. To be more precise: In Tensorflow: running_mean = decay*running_mean + (1-decay)*new_value In PyTorch: running_mean = (1-decay)*running_mean + decay

BatchNorm momentum convention PyTorch

╄→гoц情女王★ 提交于 2020-07-17 05:45:11
问题 Is the batchnorm momentum convention (default=0.1) correct as in other libraries e.g. Tensorflow it seems to usually be 0.9 or 0.99 by default? Or maybe we are just using a different convention? 回答1: It seems that the parametrization convention is different in pytorch than in tensorflow, so that 0.1 in pytorch is equivalent to 0.9 in tensorflow. To be more precise: In Tensorflow: running_mean = decay*running_mean + (1-decay)*new_value In PyTorch: running_mean = (1-decay)*running_mean + decay

Why would moving_mean and moving _variance in Tensorflow BN layer become nan when I set is_training=False in training time?

随声附和 提交于 2020-07-10 06:33:26
问题 In the training time, I want to keep BN layer unchange, so I pass is_training=False to: tf.contrib.layers.batch_norm(tensor_go_next, decay=0.9, center=True, scale=True, epsilon=1e-9, updates_collections=tf.GraphKeys.UPDATE_OPS, is_training=False, scope=name_bn_scope) and didn't put name_bn_scope/gamma:0 name_bn_scope/beta:0 to train var_list. After training, gamma and beta are still the same, which is what I want exactly. But the moving_mean and moving _variance would become nan matrix after

Why would moving_mean and moving _variance in Tensorflow BN layer become nan when I set is_training=False in training time?

…衆ロ難τιáo~ 提交于 2020-07-10 06:32:37
问题 In the training time, I want to keep BN layer unchange, so I pass is_training=False to: tf.contrib.layers.batch_norm(tensor_go_next, decay=0.9, center=True, scale=True, epsilon=1e-9, updates_collections=tf.GraphKeys.UPDATE_OPS, is_training=False, scope=name_bn_scope) and didn't put name_bn_scope/gamma:0 name_bn_scope/beta:0 to train var_list. After training, gamma and beta are still the same, which is what I want exactly. But the moving_mean and moving _variance would become nan matrix after

How to calculate batch normalization with python?

梦想的初衷 提交于 2020-01-30 10:55:06
问题 When I implement batch normalization in python from scrach, I am confused. Please see A paper demonstrates some figures about normalization methods, I think it may be not correct. The description and figure are both not correct. Description from the paper: Figure from the paper: As far as I am concerned, the representation of batch normalization is not correct in the original paper. I post the issue here for discussion. I think the batch normalization should be like the following figure. The

Batch normalization layer for CNN-LSTM

只谈情不闲聊 提交于 2020-01-22 16:11:09
问题 Suppose that I have a model like this (this is a model for time series forecasting): ipt = Input((data.shape[1] ,data.shape[2])) # 1 x = Conv1D(filters = 10, kernel_size = 3, padding = 'causal', activation = 'relu')(ipt) # 2 x = LSTM(15, return_sequences = False)(x) # 3 x = BatchNormalization()(x) # 4 out = Dense(1, activation = 'relu')(x) # 5 Now I want to add batch normalization layer to this network. Considering the fact that batch normalization doesn't work with LSTM, Can I add it before

Batch normalization when batch size=1

心不动则不痛 提交于 2020-01-21 18:59:44
问题 What will happen when I use batch normalization but set batch_size = 1 ? Because I am using 3D medical images as training dataset, the batch size can only be set to 1 because of GPU limitation. Normally, I know, when batch_size = 1 , variance will be 0. And (x-mean)/variance will lead to error because of division by 0. But why did errors not occur when I set batch_size = 1 ? Why my network was trained as good as I expected? Could anyone explain it? Some people argued that: The