How to use the Attention-Layer properly in an LSTM-AutoEncoder network? On the internet there are many different opinions on this question. In Keras there are predefined Att