attention-model

Is there any way to convert pytorch tensor to tensorflow tensor

我是研究僧i 提交于 2021-02-11 16:39:22
问题 https://github.com/taoshen58/BiBloSA/blob/ec67cbdc411278dd29e8888e9fd6451695efc26c/context_fusion/self_attn.py#L29 I need to use mulit_dimensional_attention from the above link which is implemented in TensorFlow but I am using PyTorch so can I Convert Pytorch Tensor to TensorFlow Tensor or I have to implement it in PyTorch. code which I am trying to use here I have to pass 'rep_tensor' as TensorFlow tensor type but I have PyTorch tensor def multi_dimensional_attention(rep_tensor, rep_mask

Can't set the attribute “trainable_weights”, likely because it conflicts with an existing read-only

假如想象 提交于 2021-01-29 06:10:42
问题 My code was running perfectly in colab. But today it's not running. It says Can't set the attribute "trainable_weights", likely because it conflicts with an existing read-only @property of the object. Please choose a different name. I am using LSTM with the attention layer. class Attention(Layer): def __init__(self, **kwargs): self.init = initializers.get('normal') #self.input_spec = [InputSpec(ndim=3)] super(Attention, self).__init__(**kwargs) def build(self, input_shape): assert len(input

Understanding Bahdanau's Attention Linear Algebra

冷暖自知 提交于 2021-01-06 03:25:57
问题 Bahdanau's Additive Attention is recognized as the second part of equation 4 in the below image. I am trying to figure out the shapes of the matrices w1 , w2 , ht , hs and v in order to figure out how this mechanism is used in this paper Can ht and hs have different final dimensions? say (batch size, total units) and (batch size, time window). Equation 8 in the mentioned paper above seem to be doing this. Equation 8 in the above paper has the below notation: what does this expand to exactly?

Sequence to Sequence - for time series prediction

▼魔方 西西 提交于 2020-05-25 07:53:47
问题 This bounty has ended . Answers to this question are eligible for a +500 reputation bounty. Bounty grace period ends in 8 hours . Roni Gadot wants to draw more attention to this question. I've tried to build a sequence to sequence model to predict a sensor signal over time based on its first few inputs (see figure below) The model works OK, but I want to 'spice things up' and try to add an attention layer between the two LSTM layers. Model code: def train_model(x_train, y_train, n_units=32, n

Hierarchical Attention Network - model.fit generates error 'ValueError: Input dimension mis-match'

ぐ巨炮叔叔 提交于 2020-05-14 21:28:13
问题 For background, I am referring to the Hierarchical Attention Network used for sentiment classification. For code : my full code is posted below, but it is just simple revision of the original code posted by the author on the link above. And I explain my changes below. For training data : here For word embeddings : this is the Glove embedding here Key config : Keras 2.0.9, Scikit-Learn 0.19.1, Theano 0.9.0 The original code posted in the link above takes a 3D shape input, i.e., (review,

How to visualize attention weights?

淺唱寂寞╮ 提交于 2020-04-08 06:59:06
问题 Using this implementation I have included attention to my RNN (which classify the input sequences into two classes) as follows. visible = Input(shape=(250,)) embed=Embedding(vocab_size,100)(visible) activations= keras.layers.GRU(250, return_sequences=True)(embed) attention = TimeDistributed(Dense(1, activation='tanh'))(activations) attention = Flatten()(attention) attention = Activation('softmax')(attention) attention = RepeatVector(250)(attention) attention = Permute([2, 1])(attention) sent

Why does embedding vector multiplied by a constant in Transformer model?

本小妞迷上赌 提交于 2020-04-06 03:05:22
问题 I am learning to apply Transform model proposed by Attention Is All You Need from tensorflow official document Transformer model for language understanding. As section Positional encoding says: Since this model doesn't contain any recurrence or convolution, positional encoding is added to give the model some information about the relative position of the words in the sentence. The positional encoding vector is added to the embedding vector . My understanding is to add positional encoding

How to train the self-attention model?

前提是你 提交于 2020-01-25 09:25:08
问题 I understand the whole structure of transformer as in the figure below, but one thing confused me is the bottom of the decoder part which has the input of right-shifting outputs. For example, when training the model with a pair of two language sentences, let's say the input is the sentence "I love you", and the corresponding French is the "je t'aime". How does the model train? So the input of encoder is "I love you", for the decoder, there are two things, one is "je t'aime" which should be