Deeplearning4j: QLearning network in a custom environment is choosing the same action every time, despite the heavy negative reward

后端 未结 0 835
难免孤独
难免孤独 2020-11-30 05:54

So I plugged QLearningDiscreteDense into a dots and boxes game I made. I created a custom MDP environment for it. The problem is that it chooses action 0 each time, the firs

相关标签:
回答
  • 消灭零回复
提交回复
热议问题