What does TensorFlow's `conv2d_transpose()` operation do?

后端 未结 6 644
悲&欢浪女
悲&欢浪女 2021-01-30 04:07

The documentation for the conv2d_transpose() operation does not clearly explain what it does:

The transpose of conv2d.

This opera

6条回答
  •  栀梦
    栀梦 (楼主)
    2021-01-30 04:41

    Here's another viewpoint from the "gradients" perspective, i.e. why TensorFlow documentation says conv2d_transpose() is "actually the transpose (gradient) of conv2d rather than an actual deconvolution". For more details on the actual computation done in conv2d_transpose, I would highly recommend this article, starting from page 19.

    Four Related Functions

    In tf.nn, there are 4 closely related and rather confusing functions for 2d convolution:

    • tf.nn.conv2d
    • tf.nn.conv2d_backprop_filter
    • tf.nn.conv2d_backprop_input
    • tf.nn.conv2d_transpose

    One sentence summary: they are all just 2d convolutions. Their differences are in their input arguments ordering, input rotation or transpose, strides (including fractional stride size), paddings and etc. With tf.nn.conv2d in hand, one can implement all of the 3 other ops by transforming inputs and changing the conv2d arguments.

    Problem Settings

    • Forward and backward computations:
    # forward
    out = conv2d(x, w)
    
    # backward, given d_out
    => find d_x?
    => find d_w?
    

    In the forward computation, we compute the convolution of input image x with the filter w, and the result is out. In the backward computation, assume we're given d_out, which is the gradient w.r.t. out. Our goal is to find d_x and d_w, which are the gradient w.r.t. x and w respectively.

    For the ease of discussion, we assume:

    • All stride size to be 1
    • All in_channels and out_channels are 1
    • Use VALID padding
    • Odd number filter size, this avoids some asymmetric shape problem

    Short Answer

    Conceptually, with the assumptions above, we have the following relations:

    out = conv2d(x, w, padding='VALID')
    d_x = conv2d(d_out, rot180(w), padding='FULL')
    d_w = conv2d(x, d_out, padding='VALID')
    

    Where rot180 is a 2d matrix rotated 180 degrees (a left-right flip and a top-down flip), FULL means "apply filter wherever it partly overlaps with the input" (see theano docs). Notes that this is only valid with the above assumptions, however, one can change the conv2d arguments to generalize it.

    The key takeaways:

    • The input gradient d_x is the convolution of the output gradient d_out and the weight w, with some modifications.
    • The weight gradient d_w is the convolution of the input x and the output gradient d_out, with some modifications.

    Long Answer

    Now, let's give an actual working code example of how to use the 4 functions above to compute d_x and d_w given d_out. This shows how conv2d, conv2d_backprop_filter, conv2d_backprop_input, and conv2d_transpose are related to each other. Please find the full scripts here.

    Computing d_x in 4 different ways:

    # Method 1: TF's autodiff
    d_x = tf.gradients(f, x)[0]
    
    # Method 2: manually using conv2d
    d_x_manual = tf.nn.conv2d(input=tf_pad_to_full_conv2d(d_out, w_size),
                              filter=tf_rot180(w),
                              strides=strides,
                              padding='VALID')
    
    # Method 3: conv2d_backprop_input
    d_x_backprop_input = tf.nn.conv2d_backprop_input(input_sizes=x_shape,
                                                     filter=w,
                                                     out_backprop=d_out,
                                                     strides=strides,
                                                     padding='VALID')
    
    # Method 4: conv2d_transpose
    d_x_transpose = tf.nn.conv2d_transpose(value=d_out,
                                           filter=w,
                                           output_shape=x_shape,
                                           strides=strides,
                                           padding='VALID')
    

    Computing d_w in 3 different ways:

    # Method 1: TF's autodiff
    d_w = tf.gradients(f, w)[0]
    
    # Method 2: manually using conv2d
    d_w_manual = tf_NHWC_to_HWIO(tf.nn.conv2d(input=x,
                                              filter=tf_NHWC_to_HWIO(d_out),
                                              strides=strides,
                                              padding='VALID'))
    
    # Method 3: conv2d_backprop_filter
    d_w_backprop_filter = tf.nn.conv2d_backprop_filter(input=x,
                                                       filter_sizes=w_shape,
                                                       out_backprop=d_out,
                                                       strides=strides,
                                                       padding='VALID')
    

    Please see the full scripts for the implementation of tf_rot180, tf_pad_to_full_conv2d, tf_NHWC_to_HWIO. In the scripts, we check that the final output values of different methods are the same; a numpy implementation is also available.

提交回复
热议问题