Policy Gradient Loss in Pytorch

前端 未结 0 1975
挽巷
挽巷 2021-01-16 20:56

Version 1

y = episode_a.argmax(-1)   # episode_a is in shape [T, n_actions]
action_preds = self.net(ep_s)  # action_preds is logits before softmax
neg_log_lik         


        
相关标签:
回答
  • 消灭零回复
提交回复
热议问题