Are Q-learning and SARSA with greedy selection equivalent?

前端 未结 3 649
星月不相逢
星月不相逢 2021-02-09 11:08

The difference between Q-learning and SARSA is that Q-learning compares the current state and the best possible next state, whereas SARSA compares the current state against the

3条回答
  •  情歌与酒
    2021-02-09 11:44

    If we use only the greedy policy then there will be no exploration so the learning will not work. In the limiting case where epsilon goes to 0 (like 1/t for example), then SARSA and Q-Learning would converge to the optimal policy q*. However with epsilon being fixed, SARSA will converge to the optimal epsilon-greedy policy while Q-Learning will converge to the optimal policy q*.

    I write a small note here to explain the differences between the two and hope it can help:

    https://tcnguyen.github.io/reinforcement_learning/sarsa_vs_q_learning.html

提交回复
热议问题