When to use a certain Reinforcement Learning algorithm?

后端 未结 1 1497
臣服心动
臣服心动 2021-01-30 17:53

I\'m studying Reinforcement Learning and reading Sutton\'s book for a university course. Beside the classic PD, MC, TD and Q-Learning algorithms, I\'m reading about policy gradi

相关标签:
1条回答
  • 2021-01-30 18:35

    Briefly:

    does the agent learn online or offline? helps you to decide either using on-line or off-line algorithms. (e.g. on-line: SARSA, off-line: Q-learning). On-line methods have more limitations and need more attention to pay.

    can we separate exploring and exploiting phases? These two phase are normally in a balance. For example in epsilon-greedy action selection, you use an (epsilon) probability for exploiting and (1-epsilon) probability for exploring. You can separate these two and ask the algorithm just explore first (e.g. choosing random actions) and then exploit. But this situation is possible when you are learning off-line and probably using a model for the dynamics of the system. And it normally means collecting a lot of sample data in advance.

    can we perform enough exploration? The level of exploration can be decided depending on the definition of the problem. For example, if you have a simulation model of the problem in memory, then you can explore as you want. But real exploring is limited to amount of resources you have. (e.g. energy, time, ...)

    are states and actions continuous? Considering this assumption helps to choose the right approach (algorithm). There are both discrete and continuous algorithms developed for RL. Some of "continuous" algorithms internally discretize the state or action spaces.

    0 讨论(0)
提交回复
热议问题