In OpenAI gym classic-like env training, the model yields good results and completes the task. Validating with unseen data yields considerably lower results. Tried: