I implemented a q-learning algorithm with a Scikit-Learn SGD-Regressor for Function Approximation. However, the algorithm is not working properly. My theory at the moment is