So I plugged QLearningDiscreteDense into a dots and boxes game I made. I created a custom MDP environment for it. The problem is that it chooses action 0 each time, the firs