I am quite new to Reinforcement Learning and am trying to understand the PPO algorithm. I have problems to understand the implementation of the Advantage.
I watched t