What is used to train a self-attention mechanism?

徘徊边缘 提交于 2020-01-24 16:14:27

问题


I've been trying to understand self-attention, but everything I found doesn't explain the concept on a high level very well.

Let's say we use self-attention in a NLP task, so our input is a sentence.

Then self-attention can be used to measure how "important" each word in the sentence is for every other word.

The problem is that I do not understand how that "importance" is measured. Important for what?

What exactly is the goal vector the weights in the self-attention algorithm are trained against?


回答1:


Connecting language with underlying meaning is called grounding. A sentence like “The ball is on the table” results into an image which can be reproduced with multimodal learning. Multimodal means, that different kind of words are available for example events, action words, subjects and so on. A self-attention mechanism works with mapping input vector to output vectors and between them is a neural network. The output vector of the neural network is referencing to the grounded situation.

Let us make a short example. We need a pixel image which is 300x200, we need a sentence in natural language and we need a parser. The parser works in both directions. He can convert text to image, that means the sentence “The ball is on the table” gets converted into the 300x200 image. But it is also possible to parse a given image and extract the natural sentence back. Self-attention learning is a bootstrapping technique to learn and use the grounded relationship. That means to verify existing language models, to learn new one and to predict future system states.



来源:https://stackoverflow.com/questions/53172502/what-is-used-to-train-a-self-attention-mechanism

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!