问题
Reading the description of glm in R it is not clear to me what the difference is between specifying a model offset in the formula, or using the offset argument.
In my model I have a response y, that should be divided by an offset term w, and for simplicity lets assume we have the covariate x. I use log link.
What is the difference between
glm(log(y)~x+offset(-log(w)))
and
glm(log(y)~x,offset=-log(w))
回答1:
The two ways are identical.
This can be seen in the documentation (the bold part):
this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector of length equal to the number of cases. One or more offset terms can be included in the formula instead or as well, and if more than one is specified their sum is used. See model.offset.
The above talks about the offset
argument in the glm
function and says it can be included in the formula instead or as well.
A quick example below shows that the above is true:
Data
y <- sample(1:2, 50, rep=TRUE)
x <- runif(50)
w <- 1:50
df <- data.frame(y,x)
First model:
> glm(log(y)~x+offset(-log(w)))
Call: glm(formula = log(y) ~ x + offset(-log(w)))
Coefficients:
(Intercept) x
3.6272 -0.4152
Degrees of Freedom: 49 Total (i.e. Null); 48 Residual
Null Deviance: 44.52
Residual Deviance: 43.69 AIC: 141.2
And the second way:
> glm(log(y)~x,offset=-log(w))
Call: glm(formula = log(y) ~ x, offset = -log(w))
Coefficients:
(Intercept) x
3.6272 -0.4152
Degrees of Freedom: 49 Total (i.e. Null); 48 Residual
Null Deviance: 44.52
Residual Deviance: 43.69 AIC: 141.2
As you can see the two are identical.
来源:https://stackoverflow.com/questions/29228679/offset-specification-in-r