Are Q-learning and SARSA with greedy selection equivalent?

前端 未结 3 648
星月不相逢
星月不相逢 2021-02-09 11:08

The difference between Q-learning and SARSA is that Q-learning compares the current state and the best possible next state, whereas SARSA compares the current state against the

3条回答
  •  走了就别回头了
    2021-02-09 11:44

    If an optimal policy has already formed, SARSA with pure greedy and Q-learning are same.

    However, in training, we only have a policy or sub-optimal policy, SARSA with pure greedy will only converge to the "best" sub-optimal policy available without trying to explore the optimal one, while Q-learning will do, because of , which means it tries all actions available and choose the max one.

提交回复
热议问题