Why should continuous actions be clamped?

问题

In Deep Reinforcement Learning, using continuous action spaces, why does it seem to be common practice to clamp the action right before the agent's execution?

Examples:

OpenAI Gym Mountain Car https://github.com/openai/gym/blob/master/gym/envs/classic_control/continuous_mountain_car.py#L57

Unity 3DBall https://github.com/Unity-Technologies/ml-agents/blob/master/unity-environment/Assets/ML-Agents/Examples/3DBall/Scripts/Ball3DAgent.cs#L29

Isn't information lost doing so? Like if the model outputs +10 for velocity (moving), which is then clamped to +1, the action itself behaves rather discrete (concerning its mere execution). For a fine grained movement, wouldn't it make more sense to multiply the output by something like 0.1?

回答1:

This is probably simply done to enforce constraints on what the agent can do. Maybe the agent would like to put out an action that increases velocity by 1,000,000. But if the agent is a self-driving car with a weak engine that can at most accelerate by 1 unit, we don't care if the agent would hypothetically like to accelerate by more units. The car's engine has limited capabilities.

来源：https://stackoverflow.com/questions/48046656/why-should-continuous-actions-be-clamped

标签

deep-learning

reinforcement-learning

continuous

ml-agent

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!