问题
I am using the glmnet package to perform a LASSO regression. I am now working on feature importance using the caret package. What I don't understand is the value of the importance. Could anyone enlighten me? Is there any formula to calculate these values or does that mean that these values are based on the beta values?
ROC curve variable importance
only 7 most important variables shown (out of 25)
Importance
feature1 0.8974
feature2 0.8962
feature3 0.8957
feature4 0.8744
feature5 0.8701
feature6 0.8658
feature7 0.8253
回答1:
caret
actually looks at the final coefficients of the fit and then takes the absolute value to rank the coefficients. Then the ranked coefficients are stored as variable importance.
To view the source code, you can type
getModelInfo("glmnet")$glmnet$varImp
To summarize, these are the lines to calculate it:
function(object, lambda = NULL, ...) {
## skipping a few lines
beta <- predict(object, s = lambda, type = "coef")
if(is.list(beta)) {
out <- do.call("cbind", lapply(beta, function(x) x[,1]))
out <- as.data.frame(out)
} else out <- data.frame(Overall = beta[,1])
out <- abs(out[rownames(out) != "(Intercept)",,drop = FALSE])
out
}
来源:https://stackoverflow.com/questions/37540837/caret-package-glmnet-variable-importance