glmnet - variable importance?

前端 未结 3 1539
旧巷少年郎
旧巷少年郎 2021-02-09 15:35

I´m using the glmnet package to perform a LASSO regression. Is there a way to get the importance of the individual variables that were selected? I thought about ranking the coef

相关标签:
3条回答
  • 2021-02-09 16:02

    This is how it is done in caret package.

    To summarize, you can take the absolute value of the final coefficients and rank them. The ranked coefficients are your variable importance.

    To view the source code, you can type

    caret::getModelInfo("glmnet")$glmnet$varImp
    

    If you don't want to use caret package, you can run the following lines from the package, and it should work.

    varImp <- function(object, lambda = NULL, ...) {
    
      ## skipping a few lines
    
      beta <- predict(object, s = lambda, type = "coef")
      if(is.list(beta)) {
        out <- do.call("cbind", lapply(beta, function(x) x[,1]))
        out <- as.data.frame(out, stringsAsFactors = TRUE)
      } else out <- data.frame(Overall = beta[,1])
      out <- abs(out[rownames(out) != "(Intercept)",,drop = FALSE])
      out
    }
    

    Finally, call the function with your fit.

    varImp(cvfit, lambda = cvfit$lambda.min)
    
    0 讨论(0)
  • 2021-02-09 16:02

    It's pretty easy to use the contents of the cv.glmnet object to create an ordered list of coefficients...

    coefList <- coef(cv.glmnet.MOD, s='lambda.1se')
    coefList <- data.frame(coefList@Dimnames[[1]][coefList@i+1],coefList@x)
    names(coefList) <- c('var','val')
    
    coefList %>%
      arrange(-abs(val)) %>%
      print(.,n=25)
    

    NOTE: as other posters have commented...to get a like for like comparison you need to scale/z-score your numeric variables prior to modelling step...otherwise a large coefficient value can be assigned to a variable with a very small scale i.e. range(0,1) when placed in a model with variables with very large scales i.e. range(-10000,10000) this will mean that your comparison of coefficient values is not relative and therefore meaningless in most contexts.

    0 讨论(0)
  • 2021-02-09 16:16

    Before you compare the magnitudes of the coefficients you should normalize them by multiplying each coefficent by the standard deviation of the corresponding predictor. This answer has more detail and useful links: https://stats.stackexchange.com/a/211396/34615

    0 讨论(0)
提交回复
热议问题