Is Self Attention Sensitive to Input Size?

前端未结

关注

 0  1877

I have a hierarchical model, where self attention (from the Transformer) is used to encode each word in a sentence, and then another self attention block, that encodes each sent