问题
I want to calculate variable importance for glmnet model in R. I am using glmnet
package for fitting the elastic net model like
library(glmnet)
library(caret)
library(vip)
data_y <- as.vector(mtcars$mpg)
data_x <- as.matrix(mtcars[-1])
fit.glmnet <- glmnet(data_x, data_y, family="gaussian")
set.seed(123)
cvfit.glmnet = cv.glmnet(data_x, data_y, standardize=T)
cvfit.glmnet$lambda.min
coef(cvfit.glmnet, s = "lambda.min")
Then I have used vip
package for variable importance as
#Using vip package
vip::vi_model(cvfit.glmnet, s = cvfit.glmnet$fit$lambda)
which returns me
># A tibble: 10 x 3
Variable Importance Sign
<chr> <dbl> <chr>
1 cyl -0.886 NEG
2 disp 0 NEG
3 hp -0.0117 NEG
4 drat 0 NEG
5 wt -2.71 NEG
6 qsec 0 NEG
7 vs 0 NEG
8 am 0 NEG
9 gear 0 NEG
10 carb 0 NEG
The variable importance contains both positive and negative values for the variables at the same time it does not vary between 0-1 or 0-100%.
Then I have tried customised function from this answer
#Using function provided in this example
varImp <- function(object, lambda = NULL, ...) {
## skipping a few lines
beta <- predict(object, s = lambda, type = "coef")
if(is.list(beta)) {
out <- do.call("cbind", lapply(beta, function(x) x[,1]))
out <- as.data.frame(out)
} else out <- data.frame(Overall = beta[,1])
out <- abs(out[rownames(out) != "(Intercept)",,drop = FALSE])
out
}
varImp(cvfit.glmnet, lambda = cvfit.glmnet$lambda.min)
It returns me following output
Overall
cyl 0.88608541
disp 0.00000000
hp 0.01168438
drat 0.00000000
wt 2.70814703
qsec 0.00000000
vs 0.00000000
am 0.00000000
gear 0.00000000
carb 0.00000000
Though the output from customised function does not contain negative values, it does vary within 0-1 or 0-100%.
I know that caret
package has varImp
function which gives variable importance between 0-100%. But I want to implement the same thing for cv.glmnet
object instead of caret::train
object. How can I achieve the variable importance alike caret
package for cv.glmnet
object?
回答1:
The question asks how to obtain glmnet variable importance between 0-100%.
If it is desired to assign importance based on coefficient magnitude at a certain (usually optimal) penalty. And if these coefficients are derived based on standardized variables (default in glmnet) then the coefficients can simply be scaled to the 0 - 1 range:
The slightly modified function is given:
varImp <- function(object, lambda = NULL, ...) {
beta <- predict(object, s = lambda, type = "coef")
if(is.list(beta)) {
out <- do.call("cbind", lapply(beta, function(x) x[,1]))
out <- as.data.frame(out)
} else out <- data.frame(Overall = beta[,1])
out <- abs(out[rownames(out) != "(Intercept)",,drop = FALSE])
out <- out/max(out)
out[order(out$Overall, decreasing = TRUE),,drop=FALSE]
}
Using the example in the question:
varImp(cvfit.glmnet, lambda = cvfit.glmnet$lambda.min)
#output
Overall
wt 1.000000000
cyl 0.320796270
am 0.004840186
hp 0.004605913
disp 0.000000000
drat 0.000000000
qsec 0.000000000
vs 0.000000000
gear 0.000000000
carb 0.000000000
Another approach at assigning variable importance to glmnet models would be scoring the variables based on the penalty for inclusion - Variables are more significant if the are excluded at higher penalties. This approach will be implemented in the mlr3 package: https://github.com/mlr-org/mlr3learners/issues/28 at some point
来源:https://stackoverflow.com/questions/63989057/discripencies-in-variable-importance-calculation-for-glmnet-model-in-r