In Reinforcement learning using feature approximation, does one have a single set of weights or a set of weights for each action?

懵懂的女人 提交于 2020-01-06 03:35:05

问题


This question is an attempt to reframe this question to make it clearer.

This slide shows an equation for Q(state, action) in terms of a set of weights and feature functions.

These discussions (The Basic Update Rule and Linear Value Function Approximation) show a set of weights for each action.

The reason they are different is that the first slide assumes you can anticipate the result of performing an action and then find features for the resulting states. (Note that the feature functions are functions of both the current state and the anticipated action.) In that case, the same set of weights can be applied to all the resulting features.

But in some cases, one can't anticipate the effect of an action. Then what does one do? Even if one has perfect weights, one can't apply them to the results of applying the actions if one can't anticipate those results.

My guess is that the second pair of slides deals with that problem. Instead of performing an action and then applying weights to the features of the resulting states, compute features of the current state and apply possibly different weights for each action.

Those are two very different ways of doing feature-based approximation. Are they both valid? The first one makes sense in situations, e.g., like Taxi, in which one can effectively simulate what the environment will do at each action. But in some cases, e.g., cart-pole, that's not possible/feasible. Then it would seem you need a separate set of weights for each action.

Is this the right way to think about it, or am I missing something?

Thanks.

来源:https://stackoverflow.com/questions/53398440/in-reinforcement-learning-using-feature-approximation-does-one-have-a-single-se

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!