How to use many-to-one LSTM with variable-length input on Keras?

て烟熏妆下的殇ゞ 提交于 2021-01-29 09:33:58

问题


I have a multi-class sequence labeling problem where the number of time steps varies within samples. To use LSTM with variable-length input, I applied zero padding and masking to my input.

I've read here that propagation of the mask stops after using LSTM layer with return_sequence=False parameter, that part confused me.

My question is, would it be okay to use LSTM with return_sequence=False to calculate loss correctly for the below architecture ?

from tensorflow.keras.layers import LSTM, Masking, Dense
from tensorflow.keras import models, losses
import numpy as np

np.random.seed(42)
num_samples = 4
timesteps = 6
num_feats = 1
num_classes = 5
X_train = np.random.random([num_samples, timesteps, num_feats]).astype(np.float32) # Padded train data
X_train[0, 5, 0] = 0
X_train[1, 3:, 0] = 0
X_train[2, 4:, 0] = 0
print(X_train)

model = models.Sequential()
model.add(Masking(mask_value=0, input_shape=(timesteps, num_feats)))
model.add(LSTM(32, return_sequences=False))
model.add(Dense(10))
model.add(Dense(num_classes))
print(model.summary())
model.compile(loss=losses.categorical_crossentropy, optimizer='adam', metrics=["accuracy"])
[[[0.37454012]
  [0.9507143 ]
  [0.7319939 ]
  [0.5986585 ]
  [0.15601864]
  [0.        ]]

 [[0.05808361]
  [0.8661761 ]
  [0.601115  ]
  [0.        ]
  [0.        ]
  [0.        ]]

 [[0.83244264]
  [0.21233912]
  [0.18182497]
  [0.1834045 ]
  [0.        ]
  [0.        ]]

 [[0.43194503]
  [0.29122913]
  [0.6118529 ]
  [0.13949387]
  [0.29214466]
  [0.36636186]]]
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
masking (Masking)            (None, 6, 1)              0         
_________________________________________________________________
lstm (LSTM)                  (None, 32)                4352      
_________________________________________________________________
dense (Dense)                (None, 10)                330       
_________________________________________________________________
dense_1 (Dense)              (None, 5)                 55        
=================================================================
Total params: 4,737
Trainable params: 4,737
Non-trainable params: 0
_________________________________________________________________
None

Or, should the mask be propagated from LSTM to the upcoming layers with return_sequence=True, and then a layer like TimeDistributed should be used to achieve many-to-one prediction ? I've found this approach but its author was not sure about it.

来源:https://stackoverflow.com/questions/64395745/how-to-use-many-to-one-lstm-with-variable-length-input-on-keras

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!