What is the difference between value iteration and policy iteration?

前端 未结 5 974
陌清茗
陌清茗 2021-01-29 17:44

In reinforcement learning, what is the difference between policy iteration and value iteration?

As much as I understand, in value iteration, you use t

5条回答
  •  抹茶落季
    2021-01-29 18:23

    The main difference in speed is due to the max operation in every iteration of value iteration (VI).

    In VI, each state will use just one action (with the max utility value) for calculating the updated utility value, but it first has to calculate the value of all possible actions in order to find this action via the Bellman Equation.

    In policy iteration (PI), this max operation is ommited in step 1 (policy evaluation) by just following the intermediate policy to choose the action.

    If there are N possible actions, VI has to calculate the bellman equation N times for each state and then take the max, whereas PI just calculates it one time (for the action stated by the current policy).

    However in PI, there is a policy improvement step that still uses the max operator and is as slow as a step in VI, but since PI converges in less iterations, this step won't happen as often as in VI.

提交回复
热议问题