How to implement gradient ascent in a Keras DQN
问题 Have built a Reinforcement Learning DQN with variable length sequences as inputs, and positive and negative rewards calculated for actions. Some problem with my DQN model in Keras means that although the model runs, average rewards over time decrease, over single and multiple cycles of epsilon. This does not change even after significant period of training. My thinking is that this is due to using MeanSquareError in Keras as the Loss function (minimising error). So I am trying to implement