发表新帖

发表新帖

What is the difference between value iteration and policy iteration?

前端未结

关注

 5  976

陌清茗 2021-01-29 17:44

In reinforcement learning, what is the difference between policy iteration and value iteration?

As much as I understand, in value iteration, you use t

5条回答

日久生厌 (楼主)

2021-01-29 18:09

As far as I am concerned, in contrary to @zyxue 's idea, VI is generally much faster than PI.

The reason is very straightforward, as you already knew, Bellman Equation is used for solving value function for given policy. Since we can solve the value function for optimal policy directly, solving value function for current policy is obviously a waste of time.

As for your question about the convergency of PI, I think you might overlook the fact that if you improve the strategy for each information state, then you improve the strategy for the whole game. This is also easy to prove, if you were familiar with Counterfactual Regret Minimization -- the sum of the regret for each information state has formed the upperbound of the overall regret, and thus minimizing the regret for each state will minimize the overall regret, which leads to the optimal policy.

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题