Building CNN + LSTM in Keras for a regression problem. What are proper shapes?

问题

I am working on a regression problem where I feed a set of spectograms to CNN + LSTM - architecture in keras. My data is shaped as (n_samples, width, height, n_channels). The question I have how to properly connect the CNN to the LSTM layer. The data needs to be reshaped in some way when the convolution is passed to the LSTM. There are several ideas, such as use of TimeDistributed-wrapper in combination with reshaping but I could not manage to make it work. .

height = 256
width = 256
n_channels = 3
seq_length = 1 #?

I started out with this network:

i = Input(shape=(width, height, n_channels))
    conv1 = Conv2D(filters=32,
                   activation='relu',
                   kernel_size=(2, 2),
                   padding='same')(i)
    lstm1 = LSTM(units=128,
                 activation='tanh',
                 return_sequences=False)(conv1)
    o = Dense(1)(lstm1)

I get an error that is:

ValueError: Input 0 of layer lstm is incompatible with the layer: expected ndim=3, found ndim=4. Full shape received: [None, 256, 256, 32]

I found a thread suggesting to reshape. Below is an example of how I applied the information given in thread here. It requires to add the TimeDistributed-Wrapper.

i = Input(shape=(seq_length, width, height, n_channels))
conv1 = TimeDistributed(Conv2D(filters=32,
               activation='relu',
               kernel_size=(2, 2),
               padding='same'))(i)
conv1 = Reshape((seq_length, height*width*n_channels))(conv1)
lstm1 = LSTM(units=128,
             activation='tanh',
             return_sequences=False)(conv1)
o = Dense(1)(lstm1)

This results in:

ValueError: Error when checking input: expected input_1 to have 5 dimensions, but got array with shape (5127, 256, 256, 3)

In the example from the SO above, however, the network is trained on video sequence and thus the need for TimeDistributed(?). In my case, I have a set of spectograms that originate from a signal and I am not training a video. So, an idea was to add the time_steps to 1 to overcome this. Something similar was done here. The input layer is then:

Input(shape=(seq_length, width, height, n_channels))

Resulting in an error on the reshape action.

ValueError: total size of new array must be unchanged

I'd appreciate some help on how properly connect the CNN + LSTM layers. Thank you!

回答1:

One possible solution is setting the LSTM input to be of shape (num_pixels, cnn_features). In your particular case, having a cnn with 32 filters, the LSTM would receive (256*256, 32)

cnn_features = 32

inp = tf.keras.layers.Input(shape=(256, 256, 3))
x = tf.keras.layers.Conv2D(filters=cnn_features,
                   activation='relu',
                   kernel_size=(2, 2),
                   padding='same')(inp)
x = tf.keras.layers.Reshape((256*256, cnn_features))(x)
x = tf.keras.layers.LSTM(units=128,
        activation='tanh',
        return_sequences=False)(x)
out = tf.keras.layers.Dense(1)(x)

来源：https://stackoverflow.com/questions/62169725/building-cnn-lstm-in-keras-for-a-regression-problem-what-are-proper-shapes

标签

python

tensorflow

keras

deep-learning

lstm