Tensorflow cannot get gradient wrt a Variable, but can wrt a Tensor

五迷三道 提交于 2019-12-22 11:29:38

问题


I am interested in computing the gradient of a loss that is calculated from a product of a matrix multiplication in TensorFlow with Eager Execution. I can do so if the product is computed as a tensor, but not if it's assign()ed in place to a variable. Here is the greatly reduced code:

import tensorflow as tf
import numpy as np
tf.enable_eager_execution()

multipliers_net = tf.get_variable("multipliers", shape=(1, 3, 3, 1),
                                  initializer=tf.random_normal_initializer())
activations_net = tf.Variable(tf.ones_like(multipliers_net))
output_indices = [(0, 1, 2, 0)]

def step():
    global activations_net

    #### PROBLEMATIC ####
    activations_net.assign(multipliers_net * activations_net)
    #### NO PROBLEM ####
    # activations_net = multipliers_net * activations_net

    return tf.gather_nd(activations_net, output_indices)


def train(targets):
    for y in targets:
        with tf.GradientTape() as tape:
            out = step()
            print("OUT", out)
            loss = tf.reduce_mean(tf.square(y - out))
            print("LOSS", loss)
        de_dm = tape.gradient(loss, multipliers_net)
        print("GRADIENT", de_dm, sep="\n")
        multipliers_net.assign(LEARNING_RATE * de_dm)


targets = [[2], [3], [4], [5]]

train(targets)

As it stands, this code will show the correct OUT and LOSS values, but the GRADIENT will be printed as None. However, if the line below "PROBLEMATIC" is commented and the "NO PROBLEM" is uncommented, the gradient is computed just fine. I infer this is because in the second case, activations_net becomes a Tensor, but I don't know why that suddenly makes the gradient computable, whereas before it was not.

I'm pretty sure that I should keep activations_net and multiplier_net as Variables, because in the larger scheme of things, both are updated dynamically and as I understand it, such things are best kept as Variables instead of constantly reassigning Tensors.


回答1:


I'll try to explain to the best of my knowledge. The problem occurs in a this line

de_dm = tape.gradient(loss, multipliers_net)

If you print(tape.watched_variables() in both "PROBLEMATIC" and "NO PROBLEM" cases, you'll see that in first case tape 'watches' the same multipliers_net variable twice. You can try tape.reset() and tape.watch(), but it will have no effect as long as you pass assign op into it. If you try multipliers_net.assign(any_variable) inside tf.GradientTape(), you'll find that it won't work. But if you try assigning something that produces tensor, e.g. tf.ones_like(), it will work.

multipliers_net.assign(LEARNING_RATE * de_dm)

This works for same reason. It seems to accept only eager_tensors Hope this helps



来源:https://stackoverflow.com/questions/55155847/tensorflow-cannot-get-gradient-wrt-a-variable-but-can-wrt-a-tensor

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!