I am wondering why this 'residual' tensor behave like it used deepcopy? in pytorch

前端 未结 0 1015
耶瑟儿~
耶瑟儿~ 2021-02-04 03:20

I\'m running a standard transformer model on MT task.

I found something strange while I was looking at the code.

In the code below, that "residual" tenso

相关标签:
回答
  • 消灭零回复
提交回复
热议问题