Tensorflow: How to write op with gradient in python?

后端 未结 2 649
失恋的感觉
失恋的感觉 2020-11-27 18:28

I would like to write a TensorFlow op in python, but I would like it to be differentiable (to be able to compute a gradient).

This question asks how to write an op i

相关标签:
2条回答
  • 2020-11-27 18:51

    Yes, as mentionned in @Yaroslav's answer, it is possible and the key is the links he references: here and here. I want to elaborate on this answer by giving a concret example.

    Modulo opperation: Let's implement the element-wise modulo operation in tensorflow (it already exists but its gradient is not defined, but for the example we will implement it from scratch).

    Numpy function: The first step is to define the opperation we want for numpy arrays. The element-wise modulo opperation is already implemented in numpy so it is easy:

    import numpy as np
    def np_mod(x,y):
        return (x % y).astype(np.float32)
    

    The reason for the .astype(np.float32) is because by default tensorflow takes float32 types and if you give it float64 (the numpy default) it will complain.

    Gradient Function: Next we need to define the gradient function for our opperation for each input of the opperation as tensorflow function. The function needs to take a very specific form. It need to take the tensorflow representation of the opperation op and the gradient of the output grad and say how to propagate the gradients. In our case, the gradients of the mod opperation are easy, the derivative is 1 with respect to the first argument and with respect to the second (almost everywhere, and infinite at a finite number of spots, but let's ignore that, see https://math.stackexchange.com/questions/1849280/derivative-of-remainder-function-wrt-denominator for details). So we have

    def modgrad(op, grad):
        x = op.inputs[0] # the first argument (normally you need those to calculate the gradient, like the gradient of x^2 is 2x. )
        y = op.inputs[1] # the second argument
    
        return grad * 1, grad * tf.neg(tf.floordiv(x, y)) #the propagated gradient with respect to the first and second argument respectively
    

    The grad function needs to return an n-tuple where n is the number of arguments of the operation. Notice that we need to return tensorflow functions of the input.

    Making a TF function with gradients: As explained in the sources mentioned above, there is a hack to define gradients of a function using tf.RegisterGradient [doc] and tf.Graph.gradient_override_map [doc].

    Copying the code from harpone we can modify the tf.py_func function to make it define the gradient at the same time:

    import tensorflow as tf
    
    def py_func(func, inp, Tout, stateful=True, name=None, grad=None):
    
        # Need to generate a unique name to avoid duplicates:
        rnd_name = 'PyFuncGrad' + str(np.random.randint(0, 1E+8))
    
        tf.RegisterGradient(rnd_name)(grad)  # see _MySquareGrad for grad example
        g = tf.get_default_graph()
        with g.gradient_override_map({"PyFunc": rnd_name}):
            return tf.py_func(func, inp, Tout, stateful=stateful, name=name)
    

    The stateful option is to tell tensorflow whether the function always gives the same output for the same input (stateful = False) in which case tensorflow can simply the tensorflow graph, this is our case and will probably be the case in most situations.

    Combining it all together: Now that we have all the pieces, we can combine them all together:

    from tensorflow.python.framework import ops
    
    def tf_mod(x,y, name=None):
    
        with ops.op_scope([x,y], name, "mod") as name:
            z = py_func(np_mod,
                            [x,y],
                            [tf.float32],
                            name=name,
                            grad=modgrad)  # <-- here's the call to the gradient
            return z[0]
    

    tf.py_func acts on lists of tensors (and returns a list of tensors), that is why we have [x,y] (and return z[0]). And now we are done. And we can test it.

    Test:

    with tf.Session() as sess:
    
        x = tf.constant([0.3,0.7,1.2,1.7])
        y = tf.constant([0.2,0.5,1.0,2.9])
        z = tf_mod(x,y)
        gr = tf.gradients(z, [x,y])
        tf.initialize_all_variables().run()
    
        print(x.eval(), y.eval(),z.eval(), gr[0].eval(), gr[1].eval())
    

    [ 0.30000001 0.69999999 1.20000005 1.70000005] [ 0.2 0.5 1. 2.9000001] [ 0.10000001 0.19999999 0.20000005 1.70000005] [ 1. 1. 1. 1.] [ -1. -1. -1. 0.]

    Success!

    0 讨论(0)
  • 2020-11-27 19:00

    Here's an example of adding gradient to a specific py_func https://gist.github.com/harpone/3453185b41d8d985356cbe5e57d67342

    Here's the issue discussion

    0 讨论(0)
提交回复
热议问题