Are Q-learning and SARSA with greedy selection equivalent?

前端未结

关注

 3  646

星月不相逢 2021-02-09 11:08

The difference between Q-learning and SARSA is that Q-learning compares the current state and the best possible next state, whereas SARSA compares the current state against the

3条回答

迷失自我 (楼主)

2021-02-09 11:33

Well, not actually. A key difference between SARSA and Q-learning is that SARSA is an on-policy algorithm (it follows the policy that is learning) and Q-learning is an off-policy algorithm (it can follow any policy (that fulfills some convergence requirements).

Notice that in the following pseudocode of both algorithms, that SARSA choose a' and s' and then updates the Q-function; while Q-learning first updates the Q-function, and the next action to perform is selected in the next iteration, derived from the updated Q-function and not necessarily equal to the a' selected to update Q.

In any case, both algorithms require exploration (i.e., taking actions different from the greedy action) to converge.

The pseudocode of SARSA and Q-learning have been extracted from Sutton and Barto's book: Reinforcement Learning: An Introduction (HTML version)

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...