问题
I can understand how to multiply Dense layer weights in order to get predicted output, but how can I interpret matrices from LSTM model?
Here are some toy examples (don't mind fitting, it's just about matrix multiplication)
Dense example:
from keras.models import Model
from keras.layers import Input, Dense, LSTM
import numpy as np
np.random.seed(42)
X = np.array([[1, 2], [3, 4]])
I = Input(X.shape[1:])
D = Dense(2)(I)
linear_model = Model(inputs=[I], outputs=[D])
print('linear_model.predict:\n', linear_model.predict(X))
weight, bias = linear_model.layers[1].get_weights()
print('bias + X @ weights:\n', bias + X @ weight)
Output:
linear_model.predict:
[[ 3.10299015 0.46077788]
[ 7.12412453 1.17058146]]
bias + X @ weights:
[[ 3.10299003 0.46077788]
[ 7.12412441 1.17058146]]
LSTM example:
X = X.reshape(*X.shape, 1)
I = Input(X.shape[1:])
L = LSTM(2)(I)
lstm_model = Model(inputs=[I], outputs=[L])
print('lstm_model.predict:\n', lstm_model.predict(X))
print('weights I don\'t understand:\n')
lstm_model.layers[1].get_weights()
Output:
lstm_model.predict:
[[ 0.27675897 0.15364291]
[ 0.49197391 0.04097994]]
weights I don't understand:
[array([[ 0.11056691, 0.03153521, -0.78214532, 0.04079598, 0.32587671,
0.72789955, 0.58123612, -0.57094401]], dtype=float32),
array([[-0.16277026, -0.43958429, 0.30112407, 0.07443386, 0.70584315,
0.17196879, -0.14703408, 0.36694485],
[-0.03672785, -0.55035251, 0.27230391, -0.45381972, -0.06399836,
-0.00104597, 0.14719161, -0.62441903]], dtype=float32),
array([ 0., 0., 1., 1., 0., 0., 0., 0.], dtype=float32)]
回答1:
You can get the name of the weights from the tensor object
weight_tensors = lstm_model.layers[1].weights
weight_names = list(map(lambda x: x.name, weight_tensors))
print(weight_names)
Output:
['lstm_1/kernel:0', 'lstm_1/recurrent_kernel:0', 'lstm_1/bias:0']
From the source code you can see that those weights are split into weights for the input, forget, cell state, and output
self.kernel_i = self.kernel[:, :self.units]
self.kernel_f = self.kernel[:, self.units: self.units * 2]
self.kernel_c = self.kernel[:, self.units * 2: self.units * 3]
self.kernel_o = self.kernel[:, self.units * 3:]
self.recurrent_kernel_i = self.recurrent_kernel[:, :self.units]
self.recurrent_kernel_f = self.recurrent_kernel[:, self.units: self.units * 2]
self.recurrent_kernel_c = self.recurrent_kernel[:, self.units * 2: self.units * 3]
self.recurrent_kernel_o = self.recurrent_kernel[:, self.units * 3:]
if self.use_bias:
self.bias_i = self.bias[:self.units]
self.bias_f = self.bias[self.units: self.units * 2]
self.bias_c = self.bias[self.units * 2: self.units * 3]
self.bias_o = self.bias[self.units * 3:]
else:
self.bias_i = None
self.bias_f = None
self.bias_c = None
self.bias_o = None
The usage of those weights depends on the implementation. I always refer to Christopher Olah's blog for the formulation.
来源:https://stackoverflow.com/questions/46953279/understand-keras-lstm-weights