I am confused with the calculation in self attention or attention only.
Lets talk about self-attention first and I have:
x -> [batch_size, query_len, em