Hey I am currently taking Stanford cs243 reinforcement learning course in Youtube to learn reinforcement learning in that I understand that policy is something like a functi