I am trying to use reinforcement learning to solve an optimal combination problem. The combination problem means I have like 50 actions, and I hope to find the optimal seque