Problem with implementing temporal difference based on actor-critic

前端未结

关注

 0  942

I implemented a simple actor-critic model in Tensorflow==2.3.1 to learn Cartpole environment. But it is not learning at all. The average scores of every 50 episodes is below