Why is attention using multiplication instead of substraction?

前端 未结 0 878
谎友^
谎友^ 2021-01-29 05:04

Attention is trying to get close words by multiplying the embedding vectors. And when the product is great the embedding vectors are close, so the token have similarities. But w

相关标签:
回答
  • 消灭零回复
提交回复
热议问题