Why the Actor network in DDPG algorithm always produce outputs that are all skewed to +1/-1?

后端未结

关注

 0  2027

I am just looking for some clues/hints on the behavior of my DDPG algorithm.

I have a DDPG algorithm interacting with a continuous environment using Pytorch. The Acto