Vowpal Wabbit - How to get prediction probabilities from contextual bandit model on a test sample

生来就可爱ヽ(ⅴ<●) 提交于 2019-12-05 16:50:17

When you use "--cb K", the prediction is the optimal arm/action based on argmax policy, which is a static policy.

When using "--cb_explore K", the prediction output contains the probability for each arm/action. Depending the policy you pick, the probabilities are calculated differently.

If you send those lines to a daemon running your model, you'd get just that. You send a context, and the reply is a probability distribution across the number of allowed actions, presumably comprising the "recommendation" provided by the model.

Say you have 3 actions, like in your example. Start a contextual bandits daemon:

vowpalwabbit/vw -d train.dat --cb_explore 3 -t --daemon --quiet --port 26542

Then send a context to it:

| a d d 

You'll get just what you want as the reply.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!