I\'m running a standard transformer model on MT task.
I found something strange while I was looking at the code.
In the code below, that "residual" tenso