In sample for pytorch for SeqtoSeq attention is calculated with linear layer followed by softmax.
embedded = self.embedding(input).view(1, 1, -1) embedded = s