问题
I created an LSTM using Keras with TensorFlow as backend. Before a minibatch with a num_step of 96 is given to the training, the hidden state of the LSTM is set to true values of a previous time step.
First the parameters and data:
batch_size = 10
num_steps = 96
num_input = num_output = 2
hidden_size = 8
X_train = np.array(X_train).reshape(-1, num_steps, num_input)
Y_train = np.array(Y_train).reshape(-1, num_steps, num_output)
X_test = np.array(X_test).reshape(-1, num_steps, num_input)
Y_test = np.array(Y_test).reshape(-1, num_steps, num_output)
The Keras model consists of two LSTM layers and one layer to trim the output to num_output which is 2:
model = Sequential()
model.add(LSTM(hidden_size, batch_input_shape=((batch_size, num_steps, num_input)),
return_sequences=True, stateful = True)))
model.add(LSTM(hidden_size, return_sequences=True)))
model.add(Dropout(0.2))
model.add(TimeDistributed(Dense(num_output, activation='softmax')))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['accuracy'])
The generator, as well as the training (hidden_states[x] has shape (2,)):
def gen_data():
x = np.zeros((batch_size, num_steps, num_input))
y = np.zeros((batch_size, num_steps, num_output))
while True:
for i in range(batch_size):
model.layers[0].states[0] = K.variable(value=hidden_states[gen_data.current_idx]) # hidden_states[x] has shape (2,)
x[i, :, :] = X_train[gen_data.current_idx]
y[i, :, :] = Y_train[gen_data.current_idx]
gen_data.current_idx += 1
yield x, y
gen_data.current_idx = 0
for epoch in range(100):
model.fit_generator(generate_data(), len(X_train)//batch_size, 1,
validation_data=None, max_queue_size=1, shuffle=False)
gen_data.current_idx = 0
This code does not give me an error, but I have two questions about it:
1) Inside the generator I set the hidden state of the LSTM model.layers[0].states[0]
to a variable on hidden_states[gen_data.current_idx]
with the shape (2,).
Why is this possible for an LSTM with a hidden size greater than 2?
2) The values in hidden_states[gen_data.current_idx]
could also be an output from the Keras model. Does it make sense for a two-layer LSTM to set the hidden state in this way?
回答1:
States in LSTM
An LSTM is made up of gates which calculate the cell state
and hidden state
.
In the figure the top arrow coming out of the right of LSTM is the cell state (c_t
) and the bottom arrow is the hidden state (h_t
). The cell states are the result of gated manipulation and the size of state is same as the hidden_size
of the LSTM. Every unrolling (with its corresponding input X) results in its own cell state. In case of LSTM, the cell state is composed of two value hidden_state(h_t
) of (batch_size x hidden_size) and cell_state (c_t
) of (batch_size x hidden_size).
batch_size = 2
num_steps = 5
num_input = num_output = 1
hidden_size = 8
inputs = Input(batch_shape=(batch_size,num_steps, num_input))
lstm, state_h, state_c = LSTM(hidden_size, return_state=True, return_sequences=True)(inputs)
model = Model(inputs=inputs, outputs=[state_h, state_c])
print (model.predict(np.zeros((batch_size, num_steps, num_input))))
print (model.layers[1].cell.state_size)
Note: In case of GRU/RNN there is no cell state there is only hidden state so the cell state in case is just h_t
of size (batch_size , hidden_size)
Reference:
Keras implementation of LSTM
Keras Docs:
the number of state tensors is 1 (for RNN and GRU) or 2 (for LSTM).
Illustrated Guide to LSTM and GRU
Feeding states
In your example the layers[0]
refers 1 LSTM and layers[1]
refer to the 2nd LSTM. If your intension is to initialise the cell state (c_t
) of the nth batch as from the cell state of of (n-1) i.e previous batch there are two options
The way you are doing in the generator but use
states[1]
if you wantc_t
andstates[0]
forh_t
. Similarly uselayers[0]
for 1st LSTM andlayers[1]
for second LSTM. But useset_value
methods instead. See edit below.Use keras
Stateful=True
: With stateful set to true the LSTM states are not reset after every batch. So If you have a batch with 5 data samples (each of some sequence length) you will get a cell state for each of the 5 data samples. With stateful set to true these states are used to initialized the next batch cell state for the next batch.
Edit:
The method set_value
should be used to set the value of a tensor variable. The code model.layers[0].states[0] = K.variable(value=hidden_states[gen_data.current_idx])
is valid because what it is doing is changing the state[0] which was pointing to a variable of size (batch_size X hidden_size) to a a variable of size (batch_size x 2). It is not changing the value of the tensor variable but rather making it point to a new tensor variable of different dimension.
Test Code:
print (model.layers[0].states[0], hex(id(model.layers[0].states[0])))
model.layers[0].states[0]= K.variable(np.random.randn(10,2))
print (model.layers[0].states[0], hex(id(model.layers[0].states[0])))
Output
<tf.Variable 'lstm_18/Variable:0' shape=(10, 8) dtype=float32_ref> 0x7f8812e6ee10
<tf.Variable 'Variable_2:0' shape=(10, 2) dtype=float32_ref> 0x7f881269afd0
As you can see they are two different variable. The correct way to do this is
print (model.layers[0].states[0], hex(id(model.layers[0].states[0])))
K.set_value(model.layers[0].states[0], np.random.randn(10,8))
print (model.layers[0].states[0], hex(id(model.layers[0].states[0])))
Output
<tf.Variable 'lstm_20/Variable:0' shape=(10, 8) dtype=float32_ref> 0x7f881138eb70
<tf.Variable 'lstm_20/Variable:0' shape=(10, 8) dtype=float32_ref> 0x7f881138eb70
If your code is fixed then
K.set_value(model.layers[0].states[0], np.random.randn(10,2))
Will throw an error as the size of tensor and the size of the value you are setting to do not match.
来源:https://stackoverflow.com/questions/55534572/setting-the-hidden-state-for-each-minibatch-with-different-hidden-sizes-and-mult