问题
I have a set of multicollinear variables and I'm trying to use ridge regression to tackle that. I am using the GLMNET package in R with alpha = 0 (for ridge regression).
library(glmnet)
I have a sequence of lambda values; and I am choosing the best lambda value through cv.glmnet
lambda <- 10^seq(10, -2, length = 100)
-- creating model matrix and assigning the y variable
x <- model.matrix(dv ~ ., datamatrix) [,-1]
y <- datamatrix$dv
-- Using cross validation to determine the best lambda and predicting y using that lambda value
ridge.mod <- glmnet(x, y, alpha = 0, lambda = lambda)
cv.out <- cv.glmnet(x, y, alpha = 0)
ridge.pred <- predict(ridge.mod, s = cv.out$lambda.min, newx = x)
I am able to successfully do till this point, but I have to also check for the VIF for this particular lambda value to ensure that the coefficients have stabilized and the multicollinearity is controlled. But I am not sure how to check for VIF in GLMNET since the usual vif() function throws this error.
Error in vcov.default(mod) : there is no vcov() method for models of class elnet, glmnet
Could you please help me identify if there is anything wrong in my approach or how to solve this issue?
Is VIF not applicable for validation in GLMNET?
Thanks in advance.
回答1:
VIF is a property of set of independent variables only. It doesn't matter what dependent variable is and what kind of model you use (linear regression, generalized model) as long as it doesn't change indeperndent variables (as e.g. additive model does). See vif
function from car
package. So, VIF applied to elastic net regression, won't tell you if you have dealt with multicollinearity. It can just tell you that there was a multicollinearity to deal with.
回答2:
Hadi Regression Analysis by Examples (p295) has the following ridge regression definition of the VIF. Z is the standardized version of the covariate matrix.
回答3:
The function car::vif
will not work on objects resulting from a an lm
fit. You could potentially extract the column names from the glmnet
fit and refit with lm
. Then run vif
on the new fit.
This code should work.
library(car)
library(glmnet)
cvfit <- cv.glmnet(train.x, train.y,
family = "binomial",
type.measure = "class",
nlambda = 1000)
tmp_coeffs <- coef(cvfit, s = "lambda.min")
# get coef names
columns <- as.character(
data.frame(
name = tmp_coeffs@Dimnames[[1]][tmp_coeffs@i + 1],
coefficient = tmp_coeffs@x)[, 'name']
)
# create formula from fit
logistic_reduced <- as.formula(paste("outcome ~ ",
paste(columns[-1], collapse = " + "),
sep = ""))
# refit logistic
new.fit <- lm(logistic_reduced,
family=binomial(link='logit'),
data = train)
# get vif
vif(new.fit)
来源:https://stackoverflow.com/questions/44862009/ridge-regression-in-glmnet-in-r-calculating-vif-for-different-lambda-values-usi