Im stepping through the code here: https://www.tensorflow.org/tutorials/text/nmt_with_attention as a learning method and I am confused as to when the loss function is called