I am trying to implement a sequence 2 sequence model with attention using the Keras library. The block diagram of the model is as follows
Based on your block diagram it looks like you pass the same attention vector at every timestep to the decoder. In that case you need to RepeatVector
to copy the same attention vector at every timestep to convert a 2D attention tensor into a 3D tensor:
# ...
attention = Attention(MAX_LENGTH_Input)(encoder)
attention = RepeatVector(MAX_LENGTH_Output)(attention) # (?, 10, 1024)
decoder_input = Input(shape=(MAX_LENGTH_Output,vocab_size_output))
merge = concatenate([attention, decoder_input]) # (?, 10, 1024+8281)
# ...
Take note that this will repeat the same attention vector for every timestep.