Problem with implementing temporal difference based on actor-critic

前端 未结 0 942
慢半拍i
慢半拍i 2020-11-27 05:36

I implemented a simple actor-critic model in Tensorflow==2.3.1 to learn Cartpole environment. But it is not learning at all. The average scores of every 50 episodes is below

相关标签:
回答
  • 消灭零回复
提交回复
热议问题