问题
I have a multi-class sequence labeling problem where the number of time steps varies within samples. To use LSTM with variable-length input, I applied zero padding and masking to my input.
I've read here that propagation of the mask stops after using LSTM
layer with return_sequence=False
parameter, that part confused me.
My question is, would it be okay to use LSTM with return_sequence=False
to calculate loss correctly for the below architecture ?
from tensorflow.keras.layers import LSTM, Masking, Dense
from tensorflow.keras import models, losses
import numpy as np
np.random.seed(42)
num_samples = 4
timesteps = 6
num_feats = 1
num_classes = 5
X_train = np.random.random([num_samples, timesteps, num_feats]).astype(np.float32) # Padded train data
X_train[0, 5, 0] = 0
X_train[1, 3:, 0] = 0
X_train[2, 4:, 0] = 0
print(X_train)
model = models.Sequential()
model.add(Masking(mask_value=0, input_shape=(timesteps, num_feats)))
model.add(LSTM(32, return_sequences=False))
model.add(Dense(10))
model.add(Dense(num_classes))
print(model.summary())
model.compile(loss=losses.categorical_crossentropy, optimizer='adam', metrics=["accuracy"])
[[[0.37454012]
[0.9507143 ]
[0.7319939 ]
[0.5986585 ]
[0.15601864]
[0. ]]
[[0.05808361]
[0.8661761 ]
[0.601115 ]
[0. ]
[0. ]
[0. ]]
[[0.83244264]
[0.21233912]
[0.18182497]
[0.1834045 ]
[0. ]
[0. ]]
[[0.43194503]
[0.29122913]
[0.6118529 ]
[0.13949387]
[0.29214466]
[0.36636186]]]
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
masking (Masking) (None, 6, 1) 0
_________________________________________________________________
lstm (LSTM) (None, 32) 4352
_________________________________________________________________
dense (Dense) (None, 10) 330
_________________________________________________________________
dense_1 (Dense) (None, 5) 55
=================================================================
Total params: 4,737
Trainable params: 4,737
Non-trainable params: 0
_________________________________________________________________
None
Or, should the mask be propagated from LSTM
to the upcoming layers with return_sequence=True
, and then a layer like TimeDistributed
should be used to achieve many-to-one prediction ? I've found this approach but its author was not sure about it.
来源:https://stackoverflow.com/questions/64395745/how-to-use-many-to-one-lstm-with-variable-length-input-on-keras