What's the input of each LSTM layer in a stacked LSTM network?

萝らか妹 提交于 2020-07-08 03:12:26

问题


I'm having some difficulty understanding the input-output flow of layers in stacked LSTM networks. Let's say i have created a stacked LSTM network like the one below:

# parameters
time_steps = 10
features = 2
input_shape = [time_steps, features]
batch_size = 32

# model
model = Sequential()
model.add(LSTM(64, input_shape=input_shape,  return_sequences=True))
model.add(LSTM(32,input_shape=input_shape))

where our stacked-LSTM network consists of 2 LSTM layers with 64 and 32 hidden units respectively. In this scenario, we expect that at each time-step the 1st LSTM layer -LSTM(64)- will pass as input to the 2nd LSTM layer -LSTM(32)- a vector of size [batch_size, time-step, hidden_unit_length], which would represent the hidden state of the 1st LSTM layer at the current time-step. What confuses me is:

  1. Does the 2nd LSTM layer -LSTM(32)- receives as X(t) (as input) the hidden state of the 1st layer -LSTM(64)- that has the size [batch_size, time-step, hidden_unit_length] and passes it through it's own hidden network - in this case consisting of 32 nodes-?
  2. If the first is true, why the input_shape of the 1st -LSTM(64)- and 2nd -LSTM(32)- is the same, when the 2nd only processes the hidden state of the 1st layer? Shouldn't in our case have input_shape set to be [32, 10, 64]?

I found the LSTM visualization below very helpful (found here) but it doesn't expand on stacked-lstm networks: LSTM workings

Any help would be highly appreciated. Thanks!


回答1:


The input_shape is only required for the first layer. The subsequent layers take the output of previous layer as its input (as so their input_shape argument value is ignored)

The model below

model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(5, 2)))
model.add(LSTM(32))

represent the below architecture

Which you can verify it from model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_26 (LSTM)               (None, 5, 64)             17152     
_________________________________________________________________
lstm_27 (LSTM)               (None, 32)                12416     
=================================================================

Replacing the line

model.add(LSTM(32))

with

model.add(LSTM(32, input_shape=(1000000, 200000)))

will still give you the same architecture (verify using model.summary()) because the input_shape is ignore as it takes as input the tensor output of the previous layer.

And If you need a sequence to sequence architecture like below

you should be using the code:

model = Sequential()
model.add(LSTM(64, return_sequences=True, input_shape=(5, 2)))
model.add(LSTM(32, return_sequences=True))

which should return a model

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_32 (LSTM)               (None, 5, 64)             17152     
_________________________________________________________________
lstm_33 (LSTM)               (None, 5, 32)             12416     
=================================================================


来源:https://stackoverflow.com/questions/55385906/whats-the-input-of-each-lstm-layer-in-a-stacked-lstm-network

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!