Getting the output shape of deconvolution layer using tf.nn.conv2d_transpose in tensorflow

后端 未结 3 859
终归单人心
终归单人心 2020-12-15 01:39

According to this paper, the output shape is N + H - 1, N is input height or width, H is kernel height or width. This is obvious inver

相关标签:
3条回答
  • 2020-12-15 02:22

    The formula for the output size from the tutorial assumes that the padding P is the same before and after the image (left & right or top & bottom). Then, the number of places in which you put the kernel is: W (size of the image) - F (size of the kernel) + P (additional padding before) + P (additional padding after).

    But tensorflow also handles the situation where you need to pad more pixels to one of the sides than to the other, so that the kernels would fit correctly. You can read more about the strategies to choose the padding ("SAME" and "VALID") in the docs. The test you're talking about uses method "VALID".

    0 讨论(0)
  • 2020-12-15 02:37

    for deconvolution,

    output_size = strides * (input_size-1) + kernel_size - 2*padding
    

    strides, input_size, kernel_size, padding are integer padding is zero for 'valid'

    0 讨论(0)
  • 2020-12-15 02:38

    This discussion is really helpful. Just add some additional information. padding='SAME' can also let the bottom and right side get the one additional padded pixel. According to TensorFlow document, and the test case below

    strides = [1, 2, 2, 1]
    # Input, output: [batch, height, width, depth]
    x_shape = [2, 6, 4, 3]
    y_shape = [2, 12, 8, 2]
    
    # Filter: [kernel_height, kernel_width, output_depth, input_depth]
    f_shape = [3, 3, 2, 3]
    

    is using padding='SAME'. We can interpret padding='SAME' as:

    (W−F+pad_along_height)/S+1 = out_height,
    (W−F+pad_along_width)/S+1 = out_width.
    

    So (12 - 3 + pad_along_height) / 2 + 1 = 6, and we get pad_along_height=1. And pad_top=pad_along_height/2 = 1/2 = 0(integer division), pad_bottom=pad_along_height-pad_top=1.

    As for padding='VALID', as the name suggested, we use padding when it is proper time to use it. At first, we assume that the padded pixel = 0, if this doesn't work well, then we add 0 padding where any value outside the original input image region. For example, the test case below,

    strides = [1, 2, 2, 1]
    
    # Input, output: [batch, height, width, depth]
    x_shape = [2, 6, 4, 3]
    y_shape = [2, 13, 9, 2]
    
    # Filter: [kernel_height, kernel_width, output_depth, input_depth]
    f_shape = [3, 3, 2, 3]
    

    The output shape of conv2d is

    out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))
               = ceil(float(13 - 3 + 1) / float(3)) = ceil(11/3) = 6
               = (W−F)/S + 1.
    

    Cause (W−F)/S+1 = (13-3)/2+1 = 6, the result is an integer, we don't need to add 0 pixels around the border of the image, and pad_top=1/2, pad_left=1/2 in the TensorFlow document padding='VALID' section are all 0.

    0 讨论(0)
提交回复
热议问题