Is Self Attention Sensitive to Input Size?

前端 未结 0 1877
难免孤独
难免孤独 2021-02-09 12:23

I have a hierarchical model, where self attention (from the Transformer) is used to encode each word in a sentence, and then another self attention block, that encodes each sent

相关标签:
回答
  • 消灭零回复
提交回复
热议问题