I\'m trying to switch to TensorFlow eager mode and I find the documentation of GradientTape
, implicit_gradients
, gradients_function
and
There are 4 ways to automatically compute gradients when eager execution is enabled (actually, they also work in graph mode):
tf.GradientTape
context records computations so that you can call tfe.gradient()
to get the gradients of any tensor computed while recording with regards to any trainable variable.tfe.gradients_function()
takes a function (say f()
) and returns a gradient function (say fg()
) that can compute the gradients of the outputs of f()
with regards to the parameters of f()
(or a subset of them).tfe.implicit_gradients()
is very similar but fg()
computes the gradients of the outputs of f()
with regards to all trainable variables these outputs depend on.tfe.implicit_value_and_gradients()
is almost identical but fg()
also returns the output of the function f()
.Usually, in Machine Learning, you will want to compute the gradients of the loss with regards to the model parameters (ie. variables), and you will generally also be interested in the value of the loss itself. For this use case, the simplest and most efficient options are tf.GradientTape
and tfe.implicit_value_and_gradients()
(the other two options do not give you the value of the loss itself, so if you need it, it will require extra computations). I personally prefer tfe.implicit_value_and_gradients()
when writing production code, and tf.GradientTape
when experimenting in a Jupyter notebook.
Edit: In TF 2.0, it seems that only tf.GradientTape
remains. Maybe the other functions will be added back, but I wouldn't count on it.
Let's create a small function to highlight the differences:
import tensorflow as tf
import tensorflow.contrib.eager as tfe
tf.enable_eager_execution()
w1 = tfe.Variable(2.0)
w2 = tfe.Variable(3.0)
def weighted_sum(x1, x2):
return w1 * x1 + w2 * x2
s = weighted_sum(5., 7.)
print(s.numpy()) # 31
tf.GradientTape
Within a GradientTape
context, all operations are recorded, then you can compute the gradients of any tensor computed within the context, with regards to any trainable variable. For example, this code computes s
within the GradientTape
context, and then computes the gradient of s
with regards to w1
. Since s = w1 * x1 + w2 * x2
, the gradient of s
with regards to w1
is x1
:
with tf.GradientTape() as tape:
s = weighted_sum(5., 7.)
[w1_grad] = tape.gradient(s, [w1])
print(w1_grad.numpy()) # 5.0 = gradient of s with regards to w1 = x1
tfe.gradients_function()
This function returns another function that can compute the gradients of a function's returned value with regards to its parameters. For example, we can use it to define a function that will compute the gradients of s
with regards to x1
and x2
:
grad_fn = tfe.gradients_function(weighted_sum)
x1_grad, x2_grad = grad_fn(5., 7.)
print(x1_grad.numpy()) # 2.0 = gradient of s with regards to x1 = w1
In the context of optimization, it would make more sense compute gradients with regards to variables that we can tweak. For this, we can change the weighted_sum()
function to take w1
and w2
as parameters as well, and tell tfe.gradients_function()
to only consider the parameters named "w1"
and "w2"
:
def weighted_sum_with_weights(w1, x1, w2, x2):
return w1 * x1 + w2 * x2
grad_fn = tfe.gradients_function(weighted_sum_with_weights, params=["w1", "w2"])
[w1_grad, w2_grad] = grad_fn(w1, 5., w2, 7.)
print(w2_grad.numpy()) # 7.0 = gradient of s with regards to w2 = x2
tfe.implicit_gradients()
This function returns another function that can compute the gradients of a function's returned value with regards to all trainable variables it depends on. Going back to the first version of weighted_sum()
, we can use it to compute the gradients of s
with regards to w1
and w2
without having to explicitly pass these variables. Note that the gradient function returns a list of gradient/variable pairs:
grad_fn = tfe.implicit_gradients(weighted_sum)
[(w1_grad, w1_var), (w2_grad, w2_var)] = grad_fn(5., 7.)
print(w1_grad.numpy()) # 5.0 = gradient of s with regards to w1 = x1
assert w1_var is w1
assert w2_var is w2
This function does seem like the simplest and most useful option, since generally we are interested in computing the gradients of the loss with regards to the model parameters (ie. variables).
Note: try making w1
untrainable (w1 = tfe.Variable(2., trainable=False)
) and redefine weighted_sum()
, and you will see that grad_fn
only returns the gradient of s
with regards to w2
.
tfe.implicit_value_and_gradients()
This function is almost identical to implicit_gradients()
except the function it creates also returns the result of the function being differentiated (in this case weighted_sum()
):
grad_fn = tfe.implicit_value_and_gradients(weighted_sum)
s, [(w1_grad, w1_var), (w2_grad, w2_var)] = grad_fn(5., 7.)
print(s.numpy()) # 31.0 = s = w1 * x1 + w2 * x2
When you need both the output of a function and its gradients, this function can give you a nice performance boost, since you get the output of the function for free when computing the gradients using autodiff.