问题
My question and data is similar to the post in: Loop Through Data with Sequential Time Lags output Linear Regression Coefficients
set.seed(242)
df<- data.frame(month=order(seq(1,248,1),decreasing=TRUE),
psit=c(79,1, NA, 69, 66, 77, 76, 93, NA, 65 ,NA ,3, 45, 64, 88, 88
,76, NA, NA, 85,sample(1:10,228, replace=TRUE)),var=sample(1:10,248,
replace=TRUE))
However, the structure of my dataset differs because I have imputed missing values for psit
. Now psit
, month
and var
are now nested within a list tempdata
after using the mice()
function to impute values. Now tempdata
includes 40 new imputed datasets.
tempdata<-mice(data = df, m = 40, method = "pmm", maxit
= 50, seed = 500)
I want to take the 40 imputed datasets, run the same time lag analysis on each imputed dataset (this differs from the post above where there was one dataset to preform the time lag analysis) and pool the R-squared values of each like time lag among all imputed datasets.
Posts on mice
indicate you can pool the results of a lm()
using:
modelFit1 <- with(tempdata,lm(psit~ month))
summary(pool(modelFit1))
However, I want to pool the R-squared values for like time lags among all 40 imputed datasets. So I am unsure how to use the dyn$lm()
function on each imputed dataset in tempdata
and then use the pool()
function to pool results for the squared values.
To achieve that result. I have tried the following but get an error:
modelFit1 <- with(tempData, lapply(1:236, function(i) dyn$lm(psit ~
lag(var, -i),tail(z, 12+i))))
summary(pool(modelFit1),function(x) summary(x)$r.squared))
回答1:
Since you are using mice package, wouldn't "pool.r.squared" works for your purpose?
pool.r.squared(modelFit1, adjusted = FALSE)
# est lo 95 hi 95 fmi
# R^2 0.1345633 0.06061036 0.226836 0.1195257
来源:https://stackoverflow.com/questions/46395927/time-lag-analysis-on-list-of-imputed-datasets