glmnet lasso ROC charts

会有一股神秘感。 提交于 2020-01-06 07:34:07

问题


I was using k-fold cross validation in glmnet (which implements lasso regression), but I can’t make the ROC charts from this.

library(glmnet)
glm_net <- cv.glmnet(dev_x_matrix,dev_y_vector,family="binomial",type.measure="class")
phat <- predict(glm_net,newx=val_x_matrix,s="lambda.min")

That gets me a vector with what looks like a log of the fitted values. I was trying to generate some ROC charts after this but it did not work. I think it is because of the nature of the x and y objects which goes into the glmnet. Do you have any ideas.


回答1:


require("glmnet")

Just change the measure and you will get AUC. It's not a ROC curve but provides equivalent information.

glm_net <- cv.glmnet(x, y, family="binomial", type.measure="auc")
plot(glm_net)

Here is an example in a model i'm training, just to show how it looks. BTW. The algorithm is extremely fast!

For more model visualization techniques, check out the ROCr package




回答2:


I assume that you have binary observations in the set {0,1}.

You can convert the predicted values in phat variable to [0, 1] range using logit function:

phat_new = exp(phat)/(1+exp(phat))

Now, you know what the predicted value, phat_new, the true value of observations, val_y_matrix, and the percentage of 1s in your validation data-set, p, are. One way for plotting the ROC is the following:

fix t. This is the cut-off threshold (in [0,1]) for the model. Compute the following:

# percentage of 1 observations in the validation set, 
p = length(which(val_y_matrix==1))/length(val_y_matrix)

# probability of the model predicting 1 while the true value of the observation is 0, 
p_01 = sum(1*(phat_new>=t & val_y_matrix==0))/dim(val_x_matrix)[1] 

# probability of the model predicting 1 when the true value of the observation is 1, 
p_11 = sum(1*(phat_new>=t & val_y_matrix==1))/dim(val_x_matrix)[1]

# probability of false-positive, 
p_fp = p_01/(1-p)

# probability of true-positive, 
p_tp = p_11/p

# plot the ROC, 
plot(p_fp, p_tp)

I wonder if there is a better way for doing this though. If you are using classification trees, for example, you can give the loss matrix as an input to the model and the model that you will get will be different depending on the cost ratio of your loss matrix. This means that by changing the cost ratio, you will get different models and the different models will be different points on the ROC curve.



来源:https://stackoverflow.com/questions/11362974/glmnet-lasso-roc-charts

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!