问题
I figured I build an attention model, but got confused (again) regarding each layers dimension. So say, I have 90 documents, each being composed by 200 sentence-vectors. The sentence-vectors are of size 500 (each sentence embedded as 1x500). The task is a classification of each document and the sentence-vectors are already embedded!
#Creating randm features
xx = np.random.randint(100, size=(90,200,500))
y = np.random.randint(2, size=(90,1))
In the end, the attention-layer should return the most-important sentences (which are more important for the according classification).
My thoughts, and what I assume is:
The Input has to of be shape
(no_sentences_per_doc, sentence_embedding)
, sosent_input
is atensor
of shape (?,200,500).The LSTM-layer has to be half the
no_sentences_per_doc
size, cause theBidirectional-layer
doubles the lenght of the LSTM - and what I want are the according 'relevant' sentences..Then the output is passed to a
Dense
layer (to maximize the output from the LSTM layer) which has to be of sizeno_sentences_per_doc
And since LSTM just 'remembers' the sequence, I have to keep track with the
AttentionLayer
.. so the Dense-output is passed to an attentionlayer of sizeno_sentences_per_doc
Can someone please verify my thoughts/the models architecture?
# Sentence level attention model
sent_input = Input(shape=(no_sentences_per_doc, sentence_embedding), dtype='int32',name='sent_input')
sent_lstm = Bidirectional(LSTM(100, return_sequences=True),name='sent_lstm')(sent_input)
sent_dense = Dense(200, activation='relu', name='sent_dense')(sent_lstm)
sent_att,sent_coeffs = AttentionLayer(200,return_coefficients=True,name='sent_attention')(sent_dense)
preds = Dense(1, activation='softmax',name='output')(sent_att)
AttentionLayer taken from here: https://humboldt-wi.github.io/blog/research/information_systems_1819/group5_han/
来源:https://stackoverflow.com/questions/59791292/lstm-attention-layer-network-dimensions-for-classification-task