LSTM-Attention Layer Network dimensions for classification task

问题

I figured I build an attention model, but got confused (again) regarding each layers dimension. So say, I have 90 documents, each being composed by 200 sentence-vectors. The sentence-vectors are of size 500 (each sentence embedded as 1x500). The task is a classification of each document and the sentence-vectors are already embedded!

#Creating randm features
xx = np.random.randint(100, size=(90,200,500))
y = np.random.randint(2, size=(90,1))

In the end, the attention-layer should return the most-important sentences (which are more important for the according classification).

My thoughts, and what I assume is:

The Input has to of be shape (no_sentences_per_doc, sentence_embedding), so sent_inputis a tensorof shape (?,200,500).
The LSTM-layer has to be half the no_sentences_per_doc size, cause the Bidirectional-layer doubles the lenght of the LSTM - and what I want are the according 'relevant' sentences..
Then the output is passed to a Denselayer (to maximize the output from the LSTM layer) which has to be of size no_sentences_per_doc
And since LSTM just 'remembers' the sequence, I have to keep track with the AttentionLayer.. so the Dense-output is passed to an attentionlayer of size no_sentences_per_doc

Can someone please verify my thoughts/the models architecture?

# Sentence level attention model
sent_input = Input(shape=(no_sentences_per_doc, sentence_embedding), dtype='int32',name='sent_input')
sent_lstm = Bidirectional(LSTM(100, return_sequences=True),name='sent_lstm')(sent_input)
sent_dense = Dense(200, activation='relu', name='sent_dense')(sent_lstm) 
sent_att,sent_coeffs = AttentionLayer(200,return_coefficients=True,name='sent_attention')(sent_dense)
preds = Dense(1, activation='softmax',name='output')(sent_att)

AttentionLayer taken from here: https://humboldt-wi.github.io/blog/research/information_systems_1819/group5_han/

来源：https://stackoverflow.com/questions/59791292/lstm-attention-layer-network-dimensions-for-classification-task

标签

python

tensorflow

keras

lstm

attention-model