What's the difference between GradientTape, implicit_gradients, gradients_function and implicit_value_and_gradients?

前端 未结 1 1713
爱一瞬间的悲伤
爱一瞬间的悲伤 2021-02-02 01:55

I\'m trying to switch to TensorFlow eager mode and I find the documentation of GradientTape, implicit_gradients, gradients_function and

1条回答
  •  慢半拍i
    慢半拍i (楼主)
    2021-02-02 02:57

    There are 4 ways to automatically compute gradients when eager execution is enabled (actually, they also work in graph mode):

    • tf.GradientTape context records computations so that you can call tfe.gradient() to get the gradients of any tensor computed while recording with regards to any trainable variable.
    • tfe.gradients_function() takes a function (say f()) and returns a gradient function (say fg()) that can compute the gradients of the outputs of f() with regards to the parameters of f() (or a subset of them).
    • tfe.implicit_gradients() is very similar but fg() computes the gradients of the outputs of f() with regards to all trainable variables these outputs depend on.
    • tfe.implicit_value_and_gradients() is almost identical but fg() also returns the output of the function f().

    Usually, in Machine Learning, you will want to compute the gradients of the loss with regards to the model parameters (ie. variables), and you will generally also be interested in the value of the loss itself. For this use case, the simplest and most efficient options are tf.GradientTape and tfe.implicit_value_and_gradients() (the other two options do not give you the value of the loss itself, so if you need it, it will require extra computations). I personally prefer tfe.implicit_value_and_gradients() when writing production code, and tf.GradientTape when experimenting in a Jupyter notebook.

    Edit: In TF 2.0, it seems that only tf.GradientTape remains. Maybe the other functions will be added back, but I wouldn't count on it.

    Detailed example

    Let's create a small function to highlight the differences:

    import tensorflow as tf
    import tensorflow.contrib.eager as tfe
    tf.enable_eager_execution()
    
    w1 = tfe.Variable(2.0)
    w2 = tfe.Variable(3.0)
    ​
    def weighted_sum(x1, x2):
        return w1 * x1 + w2 * x2
    
    s = weighted_sum(5., 7.)
    print(s.numpy()) # 31
    

    Using tf.GradientTape

    Within a GradientTape context, all operations are recorded, then you can compute the gradients of any tensor computed within the context, with regards to any trainable variable. For example, this code computes s within the GradientTape context, and then computes the gradient of s with regards to w1. Since s = w1 * x1 + w2 * x2, the gradient of s with regards to w1 is x1:

    with tf.GradientTape() as tape:
        s = weighted_sum(5., 7.)
    ​
    [w1_grad] = tape.gradient(s, [w1])
    print(w1_grad.numpy()) # 5.0 = gradient of s with regards to w1 = x1
    

    Using tfe.gradients_function()

    This function returns another function that can compute the gradients of a function's returned value with regards to its parameters. For example, we can use it to define a function that will compute the gradients of s with regards to x1 and x2:

    grad_fn = tfe.gradients_function(weighted_sum)
    x1_grad, x2_grad = grad_fn(5., 7.)
    print(x1_grad.numpy()) # 2.0 = gradient of s with regards to x1 = w1
    

    In the context of optimization, it would make more sense compute gradients with regards to variables that we can tweak. For this, we can change the weighted_sum() function to take w1 and w2 as parameters as well, and tell tfe.gradients_function() to only consider the parameters named "w1" and "w2":

    def weighted_sum_with_weights(w1, x1, w2, x2):
        return w1 * x1 + w2 * x2
    
    grad_fn = tfe.gradients_function(weighted_sum_with_weights, params=["w1", "w2"])
    [w1_grad, w2_grad] = grad_fn(w1, 5., w2, 7.)
    print(w2_grad.numpy()) # 7.0 = gradient of s with regards to w2 = x2
    

    Using tfe.implicit_gradients()

    This function returns another function that can compute the gradients of a function's returned value with regards to all trainable variables it depends on. Going back to the first version of weighted_sum(), we can use it to compute the gradients of s with regards to w1 and w2 without having to explicitly pass these variables. Note that the gradient function returns a list of gradient/variable pairs:

    grad_fn = tfe.implicit_gradients(weighted_sum)
    [(w1_grad, w1_var), (w2_grad, w2_var)] = grad_fn(5., 7.)
    print(w1_grad.numpy()) # 5.0 = gradient of s with regards to w1 = x1
    
    assert w1_var is w1
    assert w2_var is w2
    

    This function does seem like the simplest and most useful option, since generally we are interested in computing the gradients of the loss with regards to the model parameters (ie. variables). Note: try making w1 untrainable (w1 = tfe.Variable(2., trainable=False)) and redefine weighted_sum(), and you will see that grad_fn only returns the gradient of s with regards to w2.

    Using tfe.implicit_value_and_gradients()

    This function is almost identical to implicit_gradients() except the function it creates also returns the result of the function being differentiated (in this case weighted_sum()):

    grad_fn = tfe.implicit_value_and_gradients(weighted_sum)
    s, [(w1_grad, w1_var), (w2_grad, w2_var)] = grad_fn(5., 7.)
    print(s.numpy()) # 31.0 = s = w1 * x1 + w2 * x2
    

    When you need both the output of a function and its gradients, this function can give you a nice performance boost, since you get the output of the function for free when computing the gradients using autodiff.

    0 讨论(0)
提交回复
热议问题