Effect of --oaa 2 and --loss_function=logistic in Vowpal Wabbit

老子叫甜甜 提交于 2019-12-01 18:39:06

I have experienced something similar while using --csoaa. The details could be found here. My guess is that in case of multiclass problem with N classes (no matter that you specified 2 as a number of classes) vw virtually works with N copies of features. Same example gets different ft_offset value when it's predicted/learned for every possible class and this offset is used in hashing algorithm. So all classes get "independent" set of features from same dataset's row. Of course feature values are same, but vw doesn't keep values - only feature weights. And weights are different for each possible class. And as amount of RAM used for storing these weights is fixed with -b (-b 18 by default) - the more classes you have the more chance to get a hash collision. You can try to increase -b value and check if difference between --oaa 2 and --binary results is decreasing. But I might be wrong as I didn't go too deep into the vw code.

As for loss function - you can't compare avg loss values of squared (default) and logistic loss functions directly. You shall get raw prediction values from result obtained with squared loss and get loss of these predictions in terms of logistic loss. The function will be: log(1 + exp(-label * prediction) where label is a priori known answer. Such functions (float getLoss(float prediction, float label) ) for all loss functions implemented in vw could be found in loss_functions.cc. Or you can preliminary scale raw prediction value to [0..1] with 1.f / (1.f + exp(- prediction) and then calc log loss as described on kaggle.com :

double val = 1.f / (1.f + exp(- prediction); // y = f(x) -> [0, 1]
if (val < 1e-15) val = 1e-15;
if (val > (1.0 - 1e-15)) val = 1.0 - 1e-15;
float xx = (label < 0)?0:1; // label {-1,1} -> {0,1}
double loss = xx*log(val) + (1.0 - xx) * log(1.0 - val);
loss *= -1;

You can also scale raw predictions to [0..1] with '/vowpal_wabbit/utl/logistic' script or --link=logistic parameter. Both use 1/(1+exp(-i)).

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!