Are Q-learning and SARSA with greedy selection equivalent?

前端未结

关注

 3  649

星月不相逢 2021-02-09 11:08

The difference between Q-learning and SARSA is that Q-learning compares the current state and the best possible next state, whereas SARSA compares the current state against the

3条回答

情歌与酒 (楼主)

2021-02-09 11:44

If we use only the greedy policy then there will be no exploration so the learning will not work. In the limiting case where epsilon goes to 0 (like 1/t for example), then SARSA and Q-Learning would converge to the optimal policy q*. However with epsilon being fixed, SARSA will converge to the optimal epsilon-greedy policy while Q-Learning will converge to the optimal policy q*.

I write a small note here to explain the differences between the two and hope it can help:

https://tcnguyen.github.io/reinforcement_learning/sarsa_vs_q_learning.html

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...