Why is Openai's PPO2 implementation differentiable?

前端 未结 0 1178
伪装坚强ぢ
伪装坚强ぢ 2021-02-20 01:52

I\'m trying to understand the concept behind the implementation of the OpenAI PPO2 algorithm. The loss function that is minimized is as follows: loss = pg_loss - entropy *

相关标签:
回答
  • 消灭零回复
提交回复
热议问题