I would like to extract the glmnet generated model coefficients and create a SQL query from them. The function coef(cv.glmnet.fit)
yields a \'dgCMa
Assuming you know how to obtain your lambda, I found two different ways to show the predictors needed in the selected model for that particular lambda. One of them includes the intercept. The lambda can be obtained using cross-validation by the mean of cv.glmnet from "glmnet" library. You might want to only look at the last lines for each method:
myFittedLasso = glmnet(x=myXmatrix, y=myYresponse, family="binomial")
myCrossValidated = cv.glmnet(x=myXmatrix, y=myYresponse, family="binomial")
myLambda = myCrossValidated$lambda.1se # can be simply lambda
# Method 1 without the intercept
myBetas = myFittedLasso$beta[, which(myFittedLasso$lambda == myLambda)]
myBetas[myBetas != 0]
## myPredictor1 myPredictor2 myPredictor3
## 0.24289802 0.07561533 0.18299284
# Method 2 with the intercept
myCoefficients = coef(myFittedLasso, s=myLambda)
dimnames(myCoefficients)[[1]][which(myCoefficients != 0)]
## [1] "(Intercept)" "myPredictor1" "M_myPredictor2" "myPredictor3"
myCoefficients[which(myCoefficients != 0)]
## [1] -4.07805560 0.24289802 0.07561533 0.18299284
Note that the example above implies a binomial distribution but the steps can be applied to any other kind.
Building on Mehrad's solution above, here is a simple function to print a table containing only the non-zero coefficients:
print_glmnet_coefs <- function(cvfit, s="lambda.min") {
ind <- which(coef(cvfit, s=s) != 0)
df <- data.frame(
feature=rownames(coef(cvfit, s=s))[ind],
coeficient=coef(cvfit, s=s)[ind]
)
kable(df)
}
The function above uses the kable()
function from knitr to produce a Markdown-ready table.