Are Q-learning and SARSA with greedy selection equivalent?

前端未结

关注

 3  655

星月不相逢 2021-02-09 11:08

The difference between Q-learning and SARSA is that Q-learning compares the current state and the best possible next state, whereas SARSA compares the current state against the

3条回答

走了就别回头了 (楼主)

2021-02-09 11:44

If an optimal policy has already formed, SARSA with pure greedy and Q-learning are same.

However, in training, we only have a policy or sub-optimal policy, SARSA with pure greedy will only converge to the "best" sub-optimal policy available without trying to explore the optimal one, while Q-learning will do, because of , which means it tries all actions available and choose the max one.

0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...