attention-model

How is the self-attention mechanism in Transformers able to learn how the words are related to each other?

久未见 提交于 2020-01-25 07:05:51
问题 Given the sentence The animal didn't cross the street because it was too tired , how the self-attention is able to map with a higher score the word aninal intead of the word street ? I'm wondering if that might be a consequence of the word embedding vectors fed into the network, that some how already encapsulate some degree of distance among the words. 来源: https://stackoverflow.com/questions/58855564/how-is-the-self-attention-mechanism-in-transformers-able-to-learn-how-the-words

LSTM-Attention Layer Network dimensions for classification task

感情迁移 提交于 2020-01-25 06:49:23
问题 I figured I build an attention model, but got confused (again) regarding each layers dimension. So say, I have 90 documents, each being composed by 200 sentence-vectors. The sentence-vectors are of size 500 (each sentence embedded as 1x500). The task is a classification of each document and the sentence-vectors are already embedded ! #Creating randm features xx = np.random.randint(100, size=(90,200,500)) y = np.random.randint(2, size=(90,1)) In the end, the attention-layer should return the

What is used to train a self-attention mechanism?

徘徊边缘 提交于 2020-01-24 16:14:27
问题 I've been trying to understand self-attention, but everything I found doesn't explain the concept on a high level very well. Let's say we use self-attention in a NLP task, so our input is a sentence. Then self-attention can be used to measure how "important" each word in the sentence is for every other word. The problem is that I do not understand how that "importance" is measured. Important for what? What exactly is the goal vector the weights in the self-attention algorithm are trained

What does the “source hidden state” refer to in the Attention Mechanism?

爱⌒轻易说出口 提交于 2020-01-24 10:29:06
问题 The attention weights are computed as: I want to know what the h_s refers to. In the tensorflow code, the encoder RNN returns a tuple: encoder_outputs, encoder_state = tf.nn.dynamic_rnn(...) As I think, the h_s should be the encoder_state , but the github/nmt gives a different answer? # attention_states: [batch_size, max_time, num_units] attention_states = tf.transpose(encoder_outputs, [1, 0, 2]) # Create an attention mechanism attention_mechanism = tf.contrib.seq2seq.LuongAttention( num

What does the “source hidden state” refer to in the Attention Mechanism?

非 Y 不嫁゛ 提交于 2020-01-24 10:29:05
问题 The attention weights are computed as: I want to know what the h_s refers to. In the tensorflow code, the encoder RNN returns a tuple: encoder_outputs, encoder_state = tf.nn.dynamic_rnn(...) As I think, the h_s should be the encoder_state , but the github/nmt gives a different answer? # attention_states: [batch_size, max_time, num_units] attention_states = tf.transpose(encoder_outputs, [1, 0, 2]) # Create an attention mechanism attention_mechanism = tf.contrib.seq2seq.LuongAttention( num

Adding Attention on top of simple LSTM layer in Tensorflow 2.0

拥有回忆 提交于 2020-01-14 04:08:10
问题 I have a simple network of one LSTM and two Dense layers as such: model = tf.keras.Sequential() model.add(layers.LSTM(20, input_shape=(train_X.shape[1], train_X.shape[2]))) model.add(layers.Dense(20, activation='sigmoid')) model.add(layers.Dense(1, activation='sigmoid')) model.compile(loss='mean_squared_error') It is training on data with 3 inputs (normalized 0 to 1.0) and 1 output (binary) for the purpose of classification. The data is time series data where there is a relation between time

Adding Attention on top of simple LSTM layer in Tensorflow 2.0

左心房为你撑大大i 提交于 2020-01-14 04:05:48
问题 I have a simple network of one LSTM and two Dense layers as such: model = tf.keras.Sequential() model.add(layers.LSTM(20, input_shape=(train_X.shape[1], train_X.shape[2]))) model.add(layers.Dense(20, activation='sigmoid')) model.add(layers.Dense(1, activation='sigmoid')) model.compile(loss='mean_squared_error') It is training on data with 3 inputs (normalized 0 to 1.0) and 1 output (binary) for the purpose of classification. The data is time series data where there is a relation between time

How visualize attention LSTM using keras-self-attention package?

霸气de小男生 提交于 2019-12-28 18:44:54
问题 I'm using (keras-self-attention) to implement attention LSTM in KERAS. How can I visualize the attention part after training the model? This is a time series forecasting case. from keras.models import Sequential from keras_self_attention import SeqWeightedAttention from keras.layers import LSTM, Dense, Flatten model = Sequential() model.add(LSTM(activation = 'tanh' ,units = 200, return_sequences = True, input_shape = (TrainD[0].shape[1], TrainD[0].shape[2]))) model.add(SeqSelfAttention())

weighted mask / adjusting weights in keras

杀马特。学长 韩版系。学妹 提交于 2019-12-24 14:02:02
问题 I want to provide a mask, the same size as the input image and adjust the weights learned from the image according to this mask (similar to attention, but pre-computed for each image input). How can I do this with keras (or tensorflow)? 回答1: Question How can I add another feature layer to an image, like a Mask, and have the neural network take this new feature layer into account? Answer The short answer is to add it as another colour channel to the image. If your image already has 3 colour

How to use previous output and hidden states from LSTM for the attention mechanism?

我是研究僧i 提交于 2019-12-24 00:59:32
问题 I am currently trying to code the attention mechanism from this paper: "Effective Approaches to Attention-based Neural Machine Translation", Luong, Pham, Manning (2015). (I use global attention with the dot score). However, I am unsure on how to input the hidden and output states from the lstm decode. The issue is that the input of the lstm decoder at time t depends on quantities that I need to compute using the output and hidden states from t-1. Here is the relevant part of the code: with tf