To implement attention in encoder-decoder, we have to take the hidden vector of an LSTM unit of the decoder, do several operation on it, to compute the attention weights. No