How to accumulate gradients in tensorflow?

后端 未结 2 849
小鲜肉
小鲜肉 2021-01-04 08:24

I have a question similar to this one.

Because I have limited resources and I work with a deep model (VGG-16) - used to train a triplet network - I want to accumulat

相关标签:
2条回答
  • 2021-01-04 09:14

    Tensorflow 2.0 Compatible Answer: In line with the Pop's Answer mentioned above and the explanation provided in Tensorflow Website, mentioned below is the code for Accumulating Gradients in Tensorflow Version 2.0:

    def train(epochs):
      for epoch in range(epochs):
        for (batch, (images, labels)) in enumerate(dataset):
           with tf.GradientTape() as tape:
            logits = mnist_model(images, training=True)
            tvs = mnist_model.trainable_variables
            accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]
            zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]
            loss_value = loss_object(labels, logits)
    
           loss_history.append(loss_value.numpy().mean())
           grads = tape.gradient(loss_value, tvs)
           #print(grads[0].shape)
           #print(accum_vars[0].shape)
           accum_ops = [accum_vars[i].assign_add(grad) for i, grad in enumerate(grads)]
    
    
    
        optimizer.apply_gradients(zip(grads, mnist_model.trainable_variables))
        print ('Epoch {} finished'.format(epoch))
    
    # call the above function    
    train(epochs = 3)
    

    Complete code can be found in this Github Gist.

    0 讨论(0)
  • 2021-01-04 09:20

    Let's walk through the code proposed in one of the answers you liked to:

    ## Optimizer definition - nothing different from any classical example
    opt = tf.train.AdamOptimizer()
    
    ## Retrieve all trainable variables you defined in your graph
    tvs = tf.trainable_variables()
    ## Creation of a list of variables with the same shape as the trainable ones
    # initialized with 0s
    accum_vars = [tf.Variable(tf.zeros_like(tv.initialized_value()), trainable=False) for tv in tvs]
    zero_ops = [tv.assign(tf.zeros_like(tv)) for tv in accum_vars]
    
    ## Calls the compute_gradients function of the optimizer to obtain... the list of gradients
    gvs = opt.compute_gradients(rmse, tvs)
    
    ## Adds to each element from the list you initialized earlier with zeros its gradient (works because accum_vars and gvs are in the same order)
    accum_ops = [accum_vars[i].assign_add(gv[0]) for i, gv in enumerate(gvs)]
    
    ## Define the training step (part with variable value update)
    train_step = opt.apply_gradients([(accum_vars[i], gv[1]) for i, gv in enumerate(gvs)])
    

    This first part basically adds new variables and ops to your graph which will allow you to

    1. Accumulate the gradient with ops accum_ops in (the list of) variable accum_vars
    2. Update the model weights with ops train_step

    Then, to use it when training, you have to follow these steps (still from the answer you linked):

    ## The while loop for training
    while ...:
        # Run the zero_ops to initialize it
        sess.run(zero_ops)
        # Accumulate the gradients 'n_minibatches' times in accum_vars using accum_ops
        for i in xrange(n_minibatches):
            sess.run(accum_ops, feed_dict=dict(X: Xs[i], y: ys[i]))
        # Run the train_step ops to update the weights based on your accumulated gradients
        sess.run(train_step)
    
    0 讨论(0)
提交回复
热议问题