glmnet lasso ROC charts

问题

I was using k-fold cross validation in glmnet (which implements lasso regression), but I can’t make the ROC charts from this.

library(glmnet)
glm_net <- cv.glmnet(dev_x_matrix,dev_y_vector,family="binomial",type.measure="class")
phat <- predict(glm_net,newx=val_x_matrix,s="lambda.min")

That gets me a vector with what looks like a log of the fitted values. I was trying to generate some ROC charts after this but it did not work. I think it is because of the nature of the x and y objects which goes into the glmnet. Do you have any ideas.

回答1:

require("glmnet")

Just change the measure and you will get AUC. It's not a ROC curve but provides equivalent information.

glm_net <- cv.glmnet(x, y, family="binomial", type.measure="auc")
plot(glm_net)

Here is an example in a model i'm training, just to show how it looks. BTW. The algorithm is extremely fast!

For more model visualization techniques, check out the ROCr package

回答2:

I assume that you have binary observations in the set {0,1}.

You can convert the predicted values in phat variable to [0, 1] range using logit function:

phat_new = exp(phat)/(1+exp(phat))

Now, you know what the predicted value, phat_new, the true value of observations, val_y_matrix, and the percentage of 1s in your validation data-set, p, are. One way for plotting the ROC is the following:

fix t. This is the cut-off threshold (in [0,1]) for the model. Compute the following:

# percentage of 1 observations in the validation set, 
p = length(which(val_y_matrix==1))/length(val_y_matrix)

# probability of the model predicting 1 while the true value of the observation is 0, 
p_01 = sum(1*(phat_new>=t & val_y_matrix==0))/dim(val_x_matrix)[1] 

# probability of the model predicting 1 when the true value of the observation is 1, 
p_11 = sum(1*(phat_new>=t & val_y_matrix==1))/dim(val_x_matrix)[1]

# probability of false-positive, 
p_fp = p_01/(1-p)

# probability of true-positive, 
p_tp = p_11/p

# plot the ROC, 
plot(p_fp, p_tp)

I wonder if there is a better way for doing this though. If you are using classification trees, for example, you can give the loss matrix as an input to the model and the model that you will get will be different depending on the cost ratio of your loss matrix. This means that by changing the cost ratio, you will get different models and the different models will be different points on the ROC curve.

来源：https://stackoverflow.com/questions/11362974/glmnet-lasso-roc-charts

标签

roc

glmnet

lasso-regression