Why Multi-Head Self-Attention Mechanism can learns more features than One-Head do?

前端 未结 0 1255
生来不讨喜
生来不讨喜 2021-01-29 17:57

Recently I am studying the Transformer mechanism, which contains a mechanism termed Multi-Head Self-Attention Mechanism. After browsing many blogs about how it works in detail,

相关标签:
回答
  • 消灭零回复
提交回复
热议问题