Why are the predicitions so bad even if the model's loss is so low?

后端未结

关注

 0  1948

I was training a Transformer model to convert English sentences to German. After training it for not even for 1 epoch, the loss went down to 0.009. This was