问题
Say I have a very simple model
library(foreign)
smoke <- read.dta("http://fmwww.bc.edu/ec-p/data/wooldridge/smoke.dta")
smoking.reg <- lm(cigs ~ educ, data=smoke)
AIC(smoking.reg)
BIC(smoking.reg)
In R I get the following results:
> AIC(smoking.reg)
[1] 6520.26
> BIC(smoking.reg)
[1] 6534.34
Running the same regression however in Stata
use http://fmwww.bc.edu/ec-p/data/wooldridge/smoke.dta
reg cigs educ
returns the following result
estat ic
How can I get R to return exactly the same values as does Stata for AIC and BIC?
回答1:
AIC is calculated as -2*log likelihood + 2* number of parameters
BIC is calculated as -2*log likelihood + log(n)* number of parameters
, where n
is the sample size.
Your linear regression has three parameters - two coefficients and the variance -- and so you can calculate AIC
and BIC
as
ll = logLik(smoking.reg)
aic = -2*ll + 2* 3 # 6520.26
bic = -2*ll + log(nrow(smoke))* 3 # 6534.34
(As Ben Bolker mentioned in the comments the logLik
object has several attributes which you can use to get the number of parameters ("df"
) and the number of observations ("nobs"
). See attr(ll, "df")
and attr(ll, "nobs")
)
Stata does not include the variance parameter, only including the number of coefficients. This usually would not be a problem as information criteria are usually used to compare models (AIC_of_model1 - AIC_of_model2
) and so if this parameter is omitted in both calculations it will make no difference. In Stata the calculation is
aic = -2*ll + 2* 2 # 6518.26
bic = -2*ll + log(nrow(smoke))* 2 # 6527.647
来源:https://stackoverflow.com/questions/62307197/how-to-get-the-same-values-for-aic-and-bic-in-r-as-in-stata