Adding Attention on top of simple LSTM layer in Tensorflow 2.0

左心房为你撑大大i 提交于 2020-01-14 04:05:48

问题


I have a simple network of one LSTM and two Dense layers as such:

model = tf.keras.Sequential()
model.add(layers.LSTM(20, input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(layers.Dense(20, activation='sigmoid'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_error')

It is training on data with 3 inputs (normalized 0 to 1.0) and 1 output (binary) for the purpose of classification. The data is time series data where there is a relation between time steps.

    var1(t)   var2(t)   var3(t)  var4(t)
0  0.448850  0.503847  0.498571      0.0
1  0.450992  0.503480  0.501215      0.0
2  0.451011  0.506655  0.503049      0.0

The model is trained as such:

history = model.fit(train_X, train_y, epochs=2800, batch_size=40, validation_data=(test_X, test_y), verbose=2, shuffle=False)
model.summary()

Giving the model summary:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm (LSTM)                  (None, 20)                1920      
_________________________________________________________________
dense (Dense)                (None, 20)                420       
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 21        
=================================================================
Total params: 2,361
Trainable params: 2,361
Non-trainable params: 0

The model works reasonably well. Now I am trying to replace the Dense(20) layer with an Attention layer. All the examples, tutorials, etc. online (including the TF docs) are for seq2seq models with an embedding layer at the input layer. I understand the seq2seq implementations in TF v1.x but I cannot find any documentation for what I am trying to do. I believe in the new API (v2.0) I need to do something like this:

lstm = layers.LSTM(20, input_shape=(train_X.shape[1], train_X.shape[2]), return_sequences=True)
lstm = tf.keras.layers.Bidirectional(lstm)
attention = layers.Attention() # this does not work

model = tf.keras.Sequential()
model.add(lstm)
model.add(attention)
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_error')

And of course I get the error "Attention layer must be called on a list of inputs, namely [query, value] or [query, value, key]"

I do not understand the solution to this in version (2.0) and for this case (time series data with fixed length input). Any ideas on adding attention to this type of problem is welcome.


回答1:


You must call the attention layer like this:

attention = layers.Attention()([#a list of input layers to the attention layer here])

The API documentation here



来源:https://stackoverflow.com/questions/58966874/adding-attention-on-top-of-simple-lstm-layer-in-tensorflow-2-0

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!