Extracting coefficient variable names from glmnet into a data.frame

后端 未结 8 1946
小鲜肉
小鲜肉 2020-12-05 00:09

I would like to extract the glmnet generated model coefficients and create a SQL query from them. The function coef(cv.glmnet.fit) yields a \'dgCMa

相关标签:
8条回答
  • 2020-12-05 00:44

    Here, I wrote a reproducible example and fitted a binary (logistic) example using cv.glmnet. A glmnet model fit will also work. At the end of this example, I assembled non-zero coefficients, and associated features, into a data.frame called myResults:

    library(glmnet)
    X <- matrix(rnorm(100*10), 100, 10);
    X[51:100, ] <- X[51:100, ] + 0.5; #artificially introduce difference in control cases
    rownames(X) <- paste0("observation", 1:nrow(X));
    colnames(X) <- paste0("feature",     1:ncol(X));
    
    y <- factor( c(rep(1,50), rep(0,50)) ); #binary outcome class label
    y
    ## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    ## [51] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    ## Levels: 0 1
    
    ## Perform logistic model fit:
    fit1 <- cv.glmnet(X, y, family="binomial", nfolds=5, type.measure="auc"); #with K-fold cross validation
    # fit1 <- glmnet(X, y, family="binomial") #without cross validation also works
    
    ## Adapted from @Mehrad Mahmoudian:
    myCoefs <- coef(fit1, s="lambda.min");
    myCoefs[which(myCoefs != 0 ) ]               #coefficients: intercept included
    ## [1]  1.4945869 -0.6907010 -0.7578129 -1.1451275 -0.7494350 -0.3418030 -0.8012926 -0.6597648 -0.5555719
    ## [10] -1.1269725 -0.4375461
    myCoefs@Dimnames[[1]][which(myCoefs != 0 ) ] #feature names: intercept included
    ## [1] "(Intercept)" "feature1"    "feature2"    "feature3"    "feature4"    "feature5"    "feature6"   
    ## [8] "feature7"    "feature8"    "feature9"    "feature10"  
    
    ## Asseble into a data.frame
    myResults <- data.frame(
      features = myCoefs@Dimnames[[1]][ which(myCoefs != 0 ) ], #intercept included
      coefs    = myCoefs              [ which(myCoefs != 0 ) ]  #intercept included
    )
    myResults
    ##       features      coefs
    ## 1  (Intercept)  1.4945869
    ## 2     feature1 -0.6907010
    ## 3     feature2 -0.7578129
    ## 4     feature3 -1.1451275
    ## 5     feature4 -0.7494350
    ## 6     feature5 -0.3418030
    ## 7     feature6 -0.8012926
    ## 8     feature7 -0.6597648
    ## 9     feature8 -0.5555719
    ## 10    feature9 -1.1269725
    ## 11   feature10 -0.4375461
    
    0 讨论(0)
  • 2020-12-05 00:46

    Check broom package. It has tidy function that converts output of different R objects (including glmnet) into data.frames.

    0 讨论(0)
  • 2020-12-05 00:47

    There is an approach with using coef() to glmnet() object (your model). In a case below index [[1]] indicate the number of outcome class in multinomial logistic regression, maybe for other models you shoould remove it.

    coef_names_GLMnet <- coef(GLMnet, s = 0)[[1]]
    row.names(coef_names_GLMnet)[coef_names_GLMnet@i+1]
    

    row.names() indexes in such case needs incrementing (+1) because numeration of variables (data features) in coef() object begining from 0, but after transformation character vector numeration begining from 1.

    0 讨论(0)
  • 2020-12-05 00:52

    UPDATE: Both first two comments of my answer are right. I have kept the answer below the line just for posterity.

    The following answer is short, it works and does not need any other package:

    tmp_coeffs <- coef(cv.glmnet.fit, s = "lambda.min")
    data.frame(name = tmp_coeffs@Dimnames[[1]][tmp_coeffs@i + 1], coefficient = tmp_coeffs@x)
    

    The reason for +1 is that the @i method indexes from 0 for the intercept but @Dimnames[[1]] starts at 1.


    OLD ANSWER: (only kept for posterity) Try these lines:

    The non zero coefficients:

    coef(cv.glmnet.fit, s = "lambda.min")[which(coef(cv.glmnet.fit, s = "lambda.min") != 0)]
    

    The features that are selected:

    colnames(regression_data)[which(coef(cv.glmnet.fit, s = "lambda.min") != 0)]
    

    Then putting them together as a dataframe is staight forward, but let me know if you want that part of the code also.


    0 讨论(0)
  • 2020-12-05 00:53
    # requires tibble.
    tidy_coef <- function(x){
        coef(x) %>%
        matrix %>%   # Coerce from sparse matrix to regular matrix.
        data.frame %>%  # Then dataframes.
        rownames_to_column %>%  # Add rownames as explicit variables.
        setNames(c("term","estimate"))
    }
    

    Without tibble:

    tidy_coef2 <- function(x){
        x <- coef(x)
        data.frame(term=rownames(x),
                   estimate=matrix(x)[,1],
                   stringsAsFactors = FALSE)
    }
    
    0 讨论(0)
  • 2020-12-05 00:56

    The names should be accessible as dimnames(coef(cv.glmnet.fit))[[1]], so the following should put both coefficient names and values into a data.frame: data.frame(coef.name = dimnames(coef(GLMNET))[[1]], coef.value = matrix(coef(GLMNET)))

    0 讨论(0)
提交回复
热议问题