Highlighting important words in a sentence using Deep Learning

后端 未结 1 969
逝去的感伤
逝去的感伤 2021-02-06 19:18

I am trying to highlight important words in imdb dataset which contributed finally to the sentiment analysis prediction .

The dataset is like :

X_train - A revie

1条回答
  •  心在旅途
    2021-02-06 19:35

    Here is a version with Attention (not Hierarchical) but you should be able to figure out how to make it work with hierarchy too - if not I can help out too. The trick is to define 2 models and use 1 for the training (model) and the other one to extract attention values (model_with_attention_output):

    # Tensorflow 1.9; Keras 2.2.0 (latest versions)
    # should be backwards compatible upto Keras 2.0.9 and tf 1.5
    from keras.models import Model
    from keras.layers import *
    import numpy as np
    
    dictionary_size=1000
    
    def create_models():
      #Get a sequence of indexes of words as input:
      # Keras supports dynamic input lengths if you provide (None,) as the 
      #  input shape
      inp = Input((None,))
      #Embed words into vectors of size 10 each:
      # Output shape is (None,10)
      embs = Embedding(dictionary_size, 10)(inp)
      # Run LSTM on these vectors and return output on each timestep
      # Output shape is (None,5)
      lstm = LSTM(5, return_sequences=True)(embs)
      ##Attention Block
      #Transform each timestep into 1 value (attention_value) 
      # Output shape is (None,1)
      attention = TimeDistributed(Dense(1))(lstm)
      #By running softmax on axis 1 we force attention_values
      # to sum up to 1. We are effectively assigning a "weight" to each timestep
      # Output shape is still (None,1) but each value changes
      attention_vals = Softmax(axis=1)(attention)
      # Multiply the encoded timestep by the respective weight
      # I.e. we are scaling each timestep based on its weight
      # Output shape is (None,5): (None,5)*(None,1)=(None,5)
      scaled_vecs = Multiply()([lstm,attention_vals])
      # Sum up all scaled timesteps into 1 vector 
      # i.e. obtain a weighted sum of timesteps
      # Output shape is (5,) : Observe the time dimension got collapsed
      context_vector = Lambda(lambda x: K.sum(x,axis=1))(scaled_vecs)
      ##Attention Block over
      # Get the output out
      out = Dense(1,activation='sigmoid')(context_vector)
    
      model = Model(inp, out)
      model_with_attention_output = Model(inp, [out, attention_vals])
      model.compile(optimizer='adam',loss='binary_crossentropy')
      return model, model_with_attention_output
    
    model,model_with_attention_output = create_models()
    
    
    model.fit(np.array([[1,2,3]]),[1],batch_size=1)
    print ('Attention Over each word: ',model_with_attention_output.predict(np.array([[1,2,3]]),batch_size=1)[1])
    

    The output will be the numpy array with attention value of each word - the higher the value the more important the word was

    EDIT: You might want to replace lstm in multiplication with embs to get better interpretations but it will lead to worse performance...

    0 讨论(0)
提交回复
热议问题