Adding labels on curves in glmnet plot in R

后端 未结 3 826
伪装坚强ぢ
伪装坚强ぢ 2020-12-06 12:36

I am using glmnet package to get following graph from mtcars dataset (regression of mpg on other variables):

library(glmnet)
fit = glmnet(as.matrix(mtcars[-1         


        
相关标签:
3条回答
  • 2020-12-06 13:13

    Here is a modification of the best answer, using line segments instead of text labels directly overlying the curves. This is especially useful when there are lots of variables and you only want to print those that had absolute coefficient values greater than zero:

    #note: the argument 'lra' is a cv.glmnet object
    
    
    lbs_fun <- function(lra, ...) {
    
      fit <- lra$glmnet.fit
    
      L=which(fit$lambda==lra$lambda.min)
    
      ystart <- sort(fit$beta[abs(fit$beta[,L])>0,L])
      labs <- names(ystart)
      r <- range(fit$beta[,100]) # max gap between biggest and smallest coefs at smallest lambda i.e., 100th lambda
      yfin <- seq(r[1],r[2],length=length(ystart))
    
      xstart<- log(lra$lambda.min)
      xfin <- xstart+1
    
    
      text(xfin+0.3,yfin,labels=labs,...)
      segments(xstart,ystart,xfin,yfin)
    
    
    }
    
    plot(lra$glmnet.fit,label=F, xvar="lambda", xlim=c(-5.2,0), lwd=2) #xlim, lwd is optional
    
    0 讨论(0)
  • 2020-12-06 13:30

    An alternative is the plot_glmnet function in the plotmo package. It automatically positions the variable names and has a few other bells and whistles. For example, the following code

    library(glmnet)
    mod <- glmnet(as.matrix(mtcars[-1]), mtcars[,1])
    library(plotmo) # for plot_glmnet
    plot_glmnet(mod)
    

    gives

    plot

    The variable names are spread out to prevent overplotting, but we can still make out which curve is associated with which variable. Further examples may be found in Chapter 6 in plotres vignette which is included in the plotmo package.

    0 讨论(0)
  • 2020-12-06 13:32

    As the labels are hard coded it is perhaps easier to write a quick function. This is just a quick shot, so can be changed to be more thorough. I would also note that when using the lasso there are normally a lot of variables so there will be a lot of overlap of the labels (as seen in your small example)

    lbs_fun <- function(fit, ...) {
            L <- length(fit$lambda)
            x <- log(fit$lambda[L])
            y <- fit$beta[, L]
            labs <- names(y)
            text(x, y, labels=labs, ...)
    }
    
    # plot
    plot(fit, xvar="lambda")
    
    # label
    lbs_fun(fit)
    

    enter image description here

    0 讨论(0)
提交回复
热议问题