问题
As far as I am concerned, cvfit does a K fold cross validation, which means that in each time, it separates all the data into training & validation set. For every fixed lambda, first it uses training data to get a coefficient vector. Then implements this constructed model to predict on the validation set to get the error.
Hence, for K fold CV, it has k coefficient vectors (each is generated from a training set). So what does
coef(cvfit)
get?
Here is an example:
x <- iris[1:100,1:4]
y <- iris[1:100,5]
y <- factor(y)
fit <- cv.glmnet(data.matrix(x), y, family = "binomial", type.measure = "class",alpha=1,nfolds=3,standardize = T)
coef(fit, s=c(fit$lambda.min,fit$lambda.1se))
fit1 <- glmnet(data.matrix(x), y, family = "binomial",
standardize = T,
lambda = c(fit$lambda.1se,fit$lambda.min))
coef(fit1)
in fit1, I use the whole dataset as the training set, seems that the coefficients of fit1 and fit are just the same. That's why?
Thanks in advance.
回答1:
Although cv.glmnet
checks model performance by cross-validation, the actual model coefficients it returns for each lambda
value are based on fitting the model with the full dataset.
The help for cv.glmnet
(type ?cv.glmnet
) includes a Value
section that describes the object returned by cv.glmet
. The returned list object (fit
in your case) includes an element called glmnet.fit
. The help describes it like this:
glmnet.fit a fitted glmnet object for the full data.
来源:https://stackoverflow.com/questions/48199045/r-coefficients-of-glmnetcvfit