Setting the hidden state for each minibatch with different hidden sizes and multiple LSTM layers in Keras

天大地大妈咪最大 提交于 2021-02-07 08:02:21


I created an LSTM using Keras with TensorFlow as backend. Before a minibatch with a num_step of 96 is given to the training, the hidden state of the LSTM is set to true values of a previous time step.

First the parameters and data:

batch_size = 10
num_steps = 96
num_input = num_output = 2
hidden_size = 8
X_train = np.array(X_train).reshape(-1, num_steps, num_input)
Y_train = np.array(Y_train).reshape(-1, num_steps, num_output)
X_test = np.array(X_test).reshape(-1, num_steps, num_input)
Y_test = np.array(Y_test).reshape(-1, num_steps, num_output)

The Keras model consists of two LSTM layers and one layer to trim the output to num_output which is 2:

model = Sequential()
model.add(LSTM(hidden_size, batch_input_shape=((batch_size, num_steps, num_input)),
               return_sequences=True, stateful = True)))
model.add(LSTM(hidden_size, return_sequences=True)))
model.add(TimeDistributed(Dense(num_output, activation='softmax')))

model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])

The generator, as well as the training (hidden_states[x] has shape (2,)):

def gen_data():
        x = np.zeros((batch_size, num_steps, num_input))
        y = np.zeros((batch_size, num_steps, num_output))
        while True:
            for i in range(batch_size):
                model.layers[0].states[0] = K.variable(value=hidden_states[gen_data.current_idx]) # hidden_states[x] has shape (2,)
                x[i, :, :] = X_train[gen_data.current_idx]
                y[i, :, :] = Y_train[gen_data.current_idx]
                gen_data.current_idx += 1
            yield x, y
gen_data.current_idx = 0

for epoch in range(100):
    model.fit_generator(generate_data(), len(X_train)//batch_size, 1,
                        validation_data=None, max_queue_size=1, shuffle=False)
    gen_data.current_idx = 0

This code does not give me an error, but I have two questions about it:

1) Inside the generator I set the hidden state of the LSTM model.layers[0].states[0] to a variable on hidden_states[gen_data.current_idx] with the shape (2,). Why is this possible for an LSTM with a hidden size greater than 2?

2) The values in hidden_states[gen_data.current_idx] could also be an output from the Keras model. Does it make sense for a two-layer LSTM to set the hidden state in this way?


States in LSTM

An LSTM is made up of gates which calculate the cell state and hidden state.

In the figure the top arrow coming out of the right of LSTM is the cell state (c_t) and the bottom arrow is the hidden state (h_t). The cell states are the result of gated manipulation and the size of state is same as the hidden_size of the LSTM. Every unrolling (with its corresponding input X) results in its own cell state. In case of LSTM, the cell state is composed of two value hidden_state(h_t) of (batch_size x hidden_size) and cell_state (c_t) of (batch_size x hidden_size).

batch_size = 2
num_steps = 5
num_input = num_output = 1
hidden_size = 8

inputs = Input(batch_shape=(batch_size,num_steps, num_input))
lstm, state_h, state_c = LSTM(hidden_size, return_state=True, return_sequences=True)(inputs)
model = Model(inputs=inputs, outputs=[state_h, state_c])

print (model.predict(np.zeros((batch_size, num_steps, num_input))))
print (model.layers[1].cell.state_size)

Note: In case of GRU/RNN there is no cell state there is only hidden state so the cell state in case is just h_t of size (batch_size , hidden_size)


Keras implementation of LSTM

Keras Docs:

the number of state tensors is 1 (for RNN and GRU) or 2 (for LSTM).

Illustrated Guide to LSTM and GRU

Feeding states

In your example the layers[0] refers 1 LSTM and layers[1] refer to the 2nd LSTM. If your intension is to initialise the cell state (c_t) of the nth batch as from the cell state of of (n-1) i.e previous batch there are two options

  • The way you are doing in the generator but use states[1] if you want c_t and states[0] for h_t. Similarly use layers[0] for 1st LSTM and layers[1] for second LSTM. But use set_value methods instead. See edit below.

  • Use keras Stateful=True : With stateful set to true the LSTM states are not reset after every batch. So If you have a batch with 5 data samples (each of some sequence length) you will get a cell state for each of the 5 data samples. With stateful set to true these states are used to initialized the next batch cell state for the next batch.


The method set_value should be used to set the value of a tensor variable. The code model.layers[0].states[0] = K.variable(value=hidden_states[gen_data.current_idx]) is valid because what it is doing is changing the state[0] which was pointing to a variable of size (batch_size X hidden_size) to a a variable of size (batch_size x 2). It is not changing the value of the tensor variable but rather making it point to a new tensor variable of different dimension.

Test Code:

 print (model.layers[0].states[0], hex(id(model.layers[0].states[0])))
 model.layers[0].states[0]= K.variable(np.random.randn(10,2))
 print (model.layers[0].states[0], hex(id(model.layers[0].states[0])))


<tf.Variable 'lstm_18/Variable:0' shape=(10, 8) dtype=float32_ref> 0x7f8812e6ee10
<tf.Variable 'Variable_2:0' shape=(10, 2) dtype=float32_ref> 0x7f881269afd0

As you can see they are two different variable. The correct way to do this is

 print (model.layers[0].states[0], hex(id(model.layers[0].states[0])))
 K.set_value(model.layers[0].states[0], np.random.randn(10,8))
 print (model.layers[0].states[0], hex(id(model.layers[0].states[0])))


<tf.Variable 'lstm_20/Variable:0' shape=(10, 8) dtype=float32_ref> 0x7f881138eb70
<tf.Variable 'lstm_20/Variable:0' shape=(10, 8) dtype=float32_ref> 0x7f881138eb70

If your code is fixed then

K.set_value(model.layers[0].states[0], np.random.randn(10,2))

Will throw an error as the size of tensor and the size of the value you are setting to do not match.

