I\'ve recently made an attempt to implement a basic Q-Learning algorithm in Golang. Note that I\'m new to Reinforcement Learning and AI in general, so the error may very wel
If I've understood well, in your Q-learning update rule, you are using the current reward and the previous reward. However, the Q-learning rule only uses one reward (x
are states and u
are actions):
On the other hand, you are assuming that the current reward is the same that Qmax
value, which is not true. So probably you are misunderstanding the Q-learning algorithm.