How to update variable of BatchNorm in multiple GPUs in Tensorflow

妖精的绣舞 提交于 2019-12-23 05:27:52

问题


I have a network that trains the Batch Norm (BN) layer. My batch size is 16, hence, I must use multiple GPUs. I have followed the example of inceptionv3 that can be summarized as

with tf.Graph().as_default(), tf.device('/cpu:0'):
    images_splits = tf.split(axis=0, num_or_size_splits=FLAGS.num_gpus, value=images)
    labels_splits = tf.split(axis=0, num_or_size_splits=FLAGS.num_gpus, value=labels)
    for i in range(FLAGS.num_gpus):
      with tf.device('/gpu:%d' % i):
        with tf.name_scope('%s_%d' % (inception.TOWER_NAME, i)) as scope:
          ...
          # Reuse variables for the next tower.
          batchnorm_updates = tf.get_collection(slim.ops.UPDATE_OPS_COLLECTION,
                                                scope)
          grads = opt.compute_gradients(loss)
          tower_grads.append(grads)
    grads = _average_gradients(tower_grads)
    apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)
    variable_averages = tf.train.ExponentialMovingAverage(
        inception.MOVING_AVERAGE_DECAY, global_step)
    variables_to_average = (tf.trainable_variables() +
                            tf.moving_average_variables())
    variables_averages_op = variable_averages.apply(variables_to_average)
    batchnorm_updates_op = tf.group(*batchnorm_updates)
    train_op = tf.group(apply_gradient_op, variables_averages_op,
                        batchnorm_updates_op)

Unfortunatelly, it used slim library for BN layer while I used standard BN tf.contrib.layers.batch_norm

def _batch_norm(self, x, name, is_training, activation_fn, trainable=False):
    with tf.variable_scope(name+'/BatchNorm') as scope:
        o = tf.contrib.layers.batch_norm(
            x,
            scale=True,
            activation_fn=activation_fn,
            is_training=is_training,
            trainable=trainable,
            scope=scope)
        return o

For collecting moving_mean and moving_variance, I used tf.GraphKeys.UPDATE_OPS

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) 
with tf.control_dependencies(update_ops):
    self.train_op = tf.group(train_op_conv, train_op_fc)

Finally, the idea of using BN in multiple GPUs can borrow from inceptionv3 as

split_image_batch = tf.split(self.image_batch, self.conf.num_gpus, 0)
split_label_batch = tf.split(self.label_batch, self.conf.num_gpus, 0)
global_step = tf.train.get_or_create_global_step()
opt= tf.train.MomentumOptimizer(self.learning_rate, self.conf.momentum)
tower_grads_encoder = []
tower_grads_decoder = []
update_ops=[]
with tf.variable_scope(tf.get_variable_scope()):
    for i in range(self.conf.num_gpus):
        with tf.device('/gpu:%d' % i):
            net = Resnet(split_image_batch[i], self.conf.num_classes) #Build BN layer
            # Loss function
            self.reduced_loss = tf.reduce_mean(loss) + tf.add_n(l2_losses)
            # Reuse variables for the next GPU.
            tf.get_variable_scope().reuse_variables()
            update_ops.extend)tf.get_collection(tf.GraphKeys.UPDATE_OPS))
            # Compute grads
            grads_encoder = opt.compute_gradients(self.reduced_loss, var_list=encoder_trainable)
            grads_decoder = opt.compute_gradients(self.reduced_loss, var_list=decoder_trainable)
            tower_grads_encoder.append(grads_encoder)
            tower_grads_decoder.append(grads_decoder)
grads_encoder = self._average_gradients(tower_grads_encoder)
grads_decoder = self._average_gradients(tower_grads_decoder)
# Update params
train_op_conv = opt.apply_gradients(grads_encoder, global_step=global_step)
train_op_fc   = opt.apply_gradients(grads_decoder,global_step=global_step)
variable_averages = tf.train.ExponentialMovingAverage(self.conf.MOVING_AVERAGE_DECAY, global_step)
variables_averages_op = variable_averages.apply(tf.trainable_variables())

with tf.control_dependencies(update_ops):
    self.train_op = tf.group(train_op_conv, train_op_fc, variables_averages_op)

Although the code ran without error but the performance is very low. It looks that I did not collect BN parameters correctly. Could you look at my code and give me some direction for training BN in multiple GPU? Thanks


回答1:


I suspect the performance problems have to do with you doing several variables updates per step (from each batch norm in each tower).

Is there a reason you need to get batch norm updates from each GPU? We recommend just using the statistics from a single tower to update batch norm, as unless there is skew in your partitioning (which will cause other problems), it should work out to be the same.

If you restrict your batch norm updates to those from a single tower you reduce your variable updates by a factor of num_gpus.



来源:https://stackoverflow.com/questions/48150720/how-to-update-variable-of-batchnorm-in-multiple-gpus-in-tensorflow

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!