attention-model | 易学教程

Is there any way to convert pytorch tensor to tensorflow tensor

阅读更多关于 Is there any way to convert pytorch tensor to tensorflow tensor

问题 https://github.com/taoshen58/BiBloSA/blob/ec67cbdc411278dd29e8888e9fd6451695efc26c/context_fusion/self_attn.py#L29 I need to use mulit_dimensional_attention from the above link which is implemented in TensorFlow but I am using PyTorch so can I Convert Pytorch Tensor to TensorFlow Tensor or I have to implement it in PyTorch. code which I am trying to use here I have to pass 'rep_tensor' as TensorFlow tensor type but I have PyTorch tensor def multi_dimensional_attention(rep_tensor, rep_mask

Can't set the attribute “trainable_weights”, likely because it conflicts with an existing read-only

阅读更多关于 Can't set the attribute “trainable_weights”, likely because it conflicts with an existing read-only

问题 My code was running perfectly in colab. But today it's not running. It says Can't set the attribute "trainable_weights", likely because it conflicts with an existing read-only @property of the object. Please choose a different name. I am using LSTM with the attention layer. class Attention(Layer): def __init__(self, **kwargs): self.init = initializers.get('normal') #self.input_spec = [InputSpec(ndim=3)] super(Attention, self).__init__(**kwargs) def build(self, input_shape): assert len(input

Understanding Bahdanau's Attention Linear Algebra

阅读更多关于 Understanding Bahdanau's Attention Linear Algebra

问题 Bahdanau's Additive Attention is recognized as the second part of equation 4 in the below image. I am trying to figure out the shapes of the matrices w1 , w2 , ht , hs and v in order to figure out how this mechanism is used in this paper Can ht and hs have different final dimensions? say (batch size, total units) and (batch size, time window). Equation 8 in the mentioned paper above seem to be doing this. Equation 8 in the above paper has the below notation: what does this expand to exactly?

How to build a attention model with keras?

阅读更多关于 How to build a attention model with keras?

来源： https://stackoverflow.com/questions/56946995/how-to-build-a-attention-model-with-keras

InvalidArgumentError: in model.fit.generator in tensorflow

阅读更多关于 InvalidArgumentError: in model.fit.generator in tensorflow

来源： https://stackoverflow.com/questions/63071082/invalidargumenterror-in-model-fit-generator-in-tensorflow

Sequence to Sequence - for time series prediction

阅读更多关于 Sequence to Sequence - for time series prediction

问题 This bounty has ended . Answers to this question are eligible for a +500 reputation bounty. Bounty grace period ends in 8 hours . Roni Gadot wants to draw more attention to this question. I've tried to build a sequence to sequence model to predict a sensor signal over time based on its first few inputs (see figure below) The model works OK, but I want to 'spice things up' and try to add an attention layer between the two LSTM layers. Model code: def train_model(x_train, y_train, n_units=32, n

Hierarchical Attention Network - model.fit generates error 'ValueError: Input dimension mis-match'

阅读更多关于 Hierarchical Attention Network - model.fit generates error 'ValueError: Input dimension mis-match'

问题 For background, I am referring to the Hierarchical Attention Network used for sentiment classification. For code : my full code is posted below, but it is just simple revision of the original code posted by the author on the link above. And I explain my changes below. For training data : here For word embeddings : this is the Glove embedding here Key config : Keras 2.0.9, Scikit-Learn 0.19.1, Theano 0.9.0 The original code posted in the link above takes a 3D shape input, i.e., (review,

How to visualize attention weights?

阅读更多关于 How to visualize attention weights?

问题 Using this implementation I have included attention to my RNN (which classify the input sequences into two classes) as follows. visible = Input(shape=(250,)) embed=Embedding(vocab_size,100)(visible) activations= keras.layers.GRU(250, return_sequences=True)(embed) attention = TimeDistributed(Dense(1, activation='tanh'))(activations) attention = Flatten()(attention) attention = Activation('softmax')(attention) attention = RepeatVector(250)(attention) attention = Permute([2, 1])(attention) sent

Why does embedding vector multiplied by a constant in Transformer model?

阅读更多关于 Why does embedding vector multiplied by a constant in Transformer model?

问题 I am learning to apply Transform model proposed by Attention Is All You Need from tensorflow official document Transformer model for language understanding. As section Positional encoding says: Since this model doesn't contain any recurrence or convolution, positional encoding is added to give the model some information about the relative position of the words in the sentence. The positional encoding vector is added to the embedding vector . My understanding is to add positional encoding

How to train the self-attention model?

阅读更多关于 How to train the self-attention model?

问题 I understand the whole structure of transformer as in the figure below, but one thing confused me is the bottom of the decoder part which has the input of right-shifting outputs. For example, when training the model with a pair of two language sentences, let's say the input is the sentence "I love you", and the corresponding French is the "je t'aime". How does the model train? So the input of encoder is "I love you", for the decoder, there are two things, one is "je t'aime" which should be