问题
I have two types of input sequences where input1
contains 50 values and input2
contains 25 values. I tried to combine these two sequence types using a LSTM model in functional API. However since the length of my two input sequences are different, I am wondering whether what I am currently doing is the right way. My code is as follows:
input1 = Input(shape=(50,1))
x1 = LSTM(100)(input1)
input2 = Input(shape=(25,1))
x2 = LSTM(50)(input2)
x = concatenate([x1,x2])
x = Dense(200)(x)
output = Dense(1, activation='sigmoid')(x)
model = Model(inputs=[input1,input2], outputs=output)
More specifically I want to know how to combine two LSTM layers that have different input lengths (i.e. 50 and 25 in my case). I am happy to provide more details if needed.
回答1:
Actually you problem is pretty normal in task like NLP where you have different length of sequence. In your comment you discard all of previous output by using return_sequences=False
which is not common in our practice and it normally result in a low performance model.
Note: There is no ultimate solution in neural network architecture design
Here is what I can suggest.
Method 1 (No custom layer required)
You can use same latent dimension in both LSTM and stack them up in 2 dimension and treat them as one big hidden layer tensor.
input1 = Input(shape=(50,1))
x1 = LSTM(100, return_sequences=True)(input1)
input2 = Input(shape=(25,1))
x2 = LSTM(100, return_sequences=True)(input2)
x = concatenate([x1,x2], axis=1)
# output dimension = (None, 75, 100)
If you do not want to have same latent dimension, what others do is adding 1 more part which we normally call it a mapping layer which consisted of stacked of dense layer. This approach have more variable which means model is harder to train.
input1 = Input(shape=(50,1))
x1 = LSTM(100, return_sequences=True)(input1)
input2 = Input(shape=(25,1))
x2 = LSTM(50, return_sequences=True)(input2)
# normally we have more than 1 hidden layer
Map_x1 = Dense(75)(x1)
Map_x2 = Dense(75)(x2)
x = concatenate([Map_x1 ,Map_x2 ], axis=1)
# output dimension = (None, 75, 75)
Or flatten the output (both of them)
input1 = Input(shape=(50,1))
x1 = LSTM(100, return_sequences=True)(input1)
input2 = Input(shape=(25,1))
x2 = LSTM(50, return_sequences=True)(input2)
# normally we have more than 1 hidden layer
flat_x1 = Flatten()(x1)
flat_x2 = Flatten()(x2)
x = concatenate([flat_x1 ,flat_x2 ], axis=1)
# output (None, 2650)
Method 2 (custom layer required)
create your custom layer and use attention mechanism that produce a attention vector and use that attention vector as a representation of your LSTM output tensor. What others do and achieve better performance is to use last hidden state of LSTM (that you only use in your model) with attention vector as a representation.
Note: According to research, different types of attention gives almost the same performance, so I recommend "Scaled Dot-Product Attention" because it is faster to compute.
input1 = Input(shape=(50,1))
x1 = LSTM(100, return_sequences=True)(input1)
input2 = Input(shape=(25,1))
x2 = LSTM(50, return_sequences=True)(input2)
rep_x1 = custom_layer()(x1)
rep_x2 = custom_layer()(x2)
x = concatenate([rep_x1 ,rep_x2], axis=1)
# output (None, (length rep_x1+length rep_x2))
来源:https://stackoverflow.com/questions/60679680/how-to-combine-two-lstm-layers-with-different-input-sizes-in-keras