Issues Training CNN with Prime number input dimensions

问题

I am currently developing a CNN model with Keras (an autoencoder). This type my inputs are of shape (47,47,3), that is a 47x47 image with 3 (RGB) layers.

I have worked with some CNN's in the past, but this time my input dimensions are prime numbers (47 pixels). This I think is causing issues with my implementation, specifically when using MaxPooling2D and UpSampling2D in my model. I noticed that some dimensions are lost when max pooling and then up sampling.

Using model.summary() I can see that after passing my (47,47,3) input through a Conv2D(24) and MaxPooling with a (2,2) kernel (that is 24 filters and half the shape) I get a output shape of (24, 24, 24).

Now, if I try to reverse that by UpSampling with a (2,2) kernel (double the shape) and convolving again I get a (48,48,3) shaped output. That is one extra row and column than needed.

To this I thought "no problem, just chose a kernel size that gives you the desired 47 pixels when up sampling", but given that 47 is a prime number it seems to me that there is no kernel size that can do that.

Is there any way to bypass this problem that does not involve changing my input dimensions to a non-prime? Maybe I am missing something in my approach or maybe Keras has some feature I ignore that could help here.

回答1:

I advice you to use ZeroPadding2D and Cropping2D. You can pad your image asymmetrically with 0s and obtain an even size of your image without resizing it. This should solve the problem with upsampling. Moreover - remember about setting padding=same in all of your convolutional layers.

EDIT:

Just to give you an example strategy on how to perform such operations:

If before pooling the size of your network is odd - zero pad it to make it even.
After corresponding upsample operation use cropping in order to bring back your feature map to original odd size.

来源：https://stackoverflow.com/questions/46938504/issues-training-cnn-with-prime-number-input-dimensions

标签

python

keras

conv-neural-network

autoencoder

max-pooling