Shape of image after MaxPooling2D with padding ='same' --calculating layer-by-layer shape in convolution autoencoder

问题

Very briefly my question relates to image-size not remaining the same as the input image size after a maxpool layer when I use padding = 'same' in Keras code. I am going through the Keras blog: Building Autoencoders in Keras. I am building Convolution autoencoder. The autoencoder code is as follows:

input_layer = Input(shape=(28, 28, 1))
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_layer)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# at this point the representation is (4, 4, 8) i.e. 128-dimensional
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((2, 2))(x)
x = Conv2D(16, (3, 3), activation='relu')(x)
x = UpSampling2D((2, 2))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

autoencoder = Model(input_layer, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

As per autoencoder.summary(), the image output after the very-first Conv2D(16, (3, 3), activation='relu', padding='same')(input_layer) layer is 28 X 28 X 16 ie the same as input image size. This is because padding is 'same'.

In [49]: autoencoder.summary()
(Numbering of layers is given by me and not produced in output)
_________________________________________________________________
  Layer (type)                 Output Shape             Param #   
=================================================================
1.input_1 (InputLayer)         (None, 28, 28, 1)         0         
_________________________________________________________________
2.conv2d_1 (Conv2D)            (None, 28, 28, 16)        160       
_________________________________________________________________
3.max_pooling2d_1 (MaxPooling2 (None, 14, 14, 16)        0         
_________________________________________________________________
4.conv2d_2 (Conv2D)            (None, 14, 14, 8)         1160      
_________________________________________________________________
5.max_pooling2d_2 (MaxPooling2 (None, 7, 7, 8)           0         
_________________________________________________________________
6.conv2d_3 (Conv2D)            (None, 7, 7, 8)           584       
_________________________________________________________________
7.max_pooling2d_3 (MaxPooling2 (None, 4, 4, 8)           0         
_________________________________________________________________
8.conv2d_4 (Conv2D)            (None, 4, 4, 8)           584       
_________________________________________________________________
9.up_sampling2d_1 (UpSampling2 (None, 8, 8, 8)           0         
_________________________________________________________________
10.conv2d_5 (Conv2D)            (None, 8, 8, 8)           584       
_________________________________________________________________
11.up_sampling2d_2 (UpSampling2 (None, 16, 16, 8)         0         
_________________________________________________________________
12.conv2d_6 (Conv2D)            (None, 14, 14, 16)        1168      
_________________________________________________________________
13.up_sampling2d_3 (UpSampling2 (None, 28, 28, 16)        0         
_________________________________________________________________
14.conv2d_7 (Conv2D)            (None, 28, 28, 1)         145       
=================================================================

Next layer (layer 3) is, MaxPooling2D((2, 2), padding='same')(x). The summary() shows the output image size of this layer as, 14 X 14 X 16. But padding in this layer is also 'same'. So how come output image-size does not remain as 28 X 28 X 16 with padded zeros?

Also, it is not clear as to how the output shape has changed to (14 X 14 X 16) after layer 12, when input shape coming from above its earlier layer is (16 X 16 X 8).

回答1:

Next layer (layer 3) is, MaxPooling2D((2, 2), padding='same')(x). The summary() shows the output image size of this layer as, 14 X 14 X 16. But padding in this layer is also 'same'. So how come output image-size does not remain as 28 X 28 X 16 with padded zeros?

There seems to be misunderstanding of what padding does. Padding just takes care of corner cases (what to do next to the boundary of the image). But you have 2x2 maxpooling operation, and in Keras the default stride equals to the pooling size, so stride=2, which halves the image size. You need to specify stride=1 by hand to avoid that. From Keras doc:

pool_size: integer or tuple of 2 integers, factors by which to downscale (vertical, horizontal). (2, 2) will halve the input in both spatial dimension. If only one integer is specified, the same window length will be used for both dimensions.

strides: Integer, tuple of 2 integers, or None. Strides values. If None, it will default to pool_size.

For the second question

Also, it is not clear as to how the output shape has changed to (14 X 14 X 16) after layer 12, when input shape coming from above its earlier layer is (16 X 16 X 8).

Layer 12 does not have padding=same specified.

来源：https://stackoverflow.com/questions/46387056/shape-of-image-after-maxpooling2d-with-padding-same-calculating-layer-by-la

标签

deep-learning

keras-layer

keras-2