Tensorflow gradient with respect to matrix

房东的猫 提交于 2019-12-06 03:07:10

The gradient expects a scalar function, so by default, it sums up the entries. That is the default behavior simply because all of the gradient descent algorithms need that type of functionality, and stochastic gradient descent (or variations thereof) are the preferred methods inside Tensorflow. You won't find any of the more advanced algorithms (like BFGS or something) because they simply haven't been implemented yet (and they would require a true Jacobian, which also hasn't been implemented). For what its worth, here is a functioning Jacobian implementation that I wrote:

def map(f, x, dtype=None, parallel_iterations=10):
    '''
    Apply f to each of the elements in x using the specified number of parallel iterations.

    Important points:
    1. By "elements in x", we mean that we will be applying f to x[0],...x[tf.shape(x)[0]-1].
    2. The output size of f(x[i]) can be arbitrary. However, if the dtype of that output
       is different than the dtype of x, then you need to specify that as an additional argument.
    '''
    if dtype is None:
        dtype = x.dtype

    n = tf.shape(x)[0]
    loop_vars = [
        tf.constant(0, n.dtype),
        tf.TensorArray(dtype, size=n),
    ]
    _, fx = tf.while_loop(
        lambda j, _: j < n,
        lambda j, result: (j + 1, result.write(j, f(x[j]))),
        loop_vars,
        parallel_iterations=parallel_iterations
    )
    return fx.stack()

def jacobian(fx, x, parallel_iterations=10):
    '''
    Given a tensor fx, which is a function of x, vectorize fx (via tf.reshape(fx, [-1])),
    and then compute the jacobian of each entry of fx with respect to x.
    Specifically, if x has shape (m,n,...,p), and fx has L entries (tf.size(fx)=L), then
    the output will be (L,m,n,...,p), where output[i] will be (m,n,...,p), with each entry denoting the
    gradient of output[i] wrt the corresponding element of x.
    '''
    return map(lambda fxi: tf.gradients(fxi, x)[0],
               tf.reshape(fx, [-1]),
               dtype=x.dtype,
               parallel_iterations=parallel_iterations)

While this implementation works, it does not work when you try to nest it. For instance, if you try to compute the Hessian by using jacobian( jacobian( ... )), then you get some strange errors. This is being tracked as Issue 675. I am still awaiting a response on why this throws an error. I believe that there is a deep-seated bug in either the while loop implementation or the gradient implementation, but I really have no idea.

Anyway, if you just need a jacobian, try the code above.

pipsqueaker117

tf.gradients actually sums the ys and calculates the gradient of that, which is why this issue is occurring.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!