问题
I'm running an elastic net on a generalized linear model with the glmnet and caret packages in R.
My response variable is cost (where cost > $0) and hence I'd like to specify a Gaussian family with a log link for my GLM. However glmnet doesn't seem to allow me to specify (link="log")
as follows:
> lasso_fit <- glmnet(x, y, alpha=1, family="gaussian"(link="log"), lambda.min.ratio=.001)
I've tried different variants, with and without quotations, but no luck. The glmnet documentation doesn't discuss how to include a log link.
Am I missing something? Does family="gaussian"
already implicitly assume a log link?
回答1:
It is a bit confusing, but the family
argument in glmnet
and glm
are quite different. In glm
, you can specify a character
like "gaussian"
, or you can specify a function with some arguments, like gaussian(link="log")
. In glmnet
, you can only specify the family with a character
, like "gaussian"
, and there is no way to automatically set the link through that argument.
The default link for gaussian
is the identity
function, that is, no transformation at all. But, remember that a link function is just a transformation of your y
variable; you can just specify it yourself:
glmnet(x, log(y), family="gaussian")
Also note that the default link for the poisson
family is log
, but the objective function will change. See the Details under ?glmnet
in the first couple of paragraphs.
Your comments have led me to rethink my answer; I have evidence that it is not correct.
As you point out, there is a difference between E[log(Y)] and log(E[Y]). I think what the above code does is to fit E[log(Y)], which is not what you want. Here is some code to generate data and confirm what you noted in the comments:
# Generate data
set.seed(1)
x <- replicate(3,runif(1000))
y <- exp(2*x[,1] + 3*x[,2] + x[,3] + runif(1000))
df <- data.frame(y,x)
# Run the model you *want*
glm(y~., family=gaussian(link="log"), data=df)$coef
# (Intercept) X1 X2 X3
# 0.4977746 2.0449443 3.0812333 0.9451073
# Run the model you *don't want* (in two ways)
glm(log(y)~., family=gaussian(link='identity'), data=df)$coef
# (Intercept) X1 X2 X3
# 0.4726745 2.0395798 3.0167274 0.9957110
lm(log(y)~.,data=df)$coef
# (Intercept) X1 X2 X3
# 0.4726745 2.0395798 3.0167274 0.9957110
# Run the glmnet code that I suggested - getting what you *don't want*.
library(glmnet)
glmnet.model <- glmnet(x,log(y),family="gaussian", thresh=1e-8, lambda=0)
c(glmnet.model$a0, glmnet.model$beta[,1])
# s0 V1 V2 V3
# 0.4726745 2.0395798 3.0167274 0.9957110
来源:https://stackoverflow.com/questions/25206844/how-to-specify-log-link-in-glmnet