问题
I am researching how to use multiple imputation results. The following is my understanding, and please let me know if there're mistakes.
Suppose you have a data set with missing values, and you want to conduct a regression analysis. You may perform multiple imputation for m = 5 times, and for each imputed data set (5 imputed data sets now) you run a regression analysis, then "pool" the coefficient estimates from these m = 5 models via Rubin's rules (or use R package "pool").
My question is that, in mice you have a function complete()
, and the manual says you can extract completed data set by using complete(object)
.
But if I use mice for m = 5 times, does it still make sense to use complete()
? Which imputation results will complete()
get for me?
Also, does it make sense if I only use mice with m = 1? Thank you.
回答1:
You probably overlooked that mice::complete()
in arguments uses action=1
as default, which "returns the first imputed data set" (see ?mice::complete
) and actually is worthless.
You should definitely use action="long"
to take account for the "multiplicity" of the multiple imputation!
No, it makes no sense at all to use m=1
(apart from debugging), because every imputation is based on a random process and you have to pool the results (using any method whatsoever) to account for the variation. Often m>20
is recommended1.
Basically, multiple imputation works as follows:
- Create m imputation processes with a random component, to obtain
- m slightly different imputed data sets.
- Analyze each imputed data set to get slightly different parameter estimates.
- Combine results, calculating the variation in parameter estimates.
(Also see multiple-imputation-in-a-nutshell for a brief overview.)
回答2:
When you use mice
, you get an object that is not the imputed data set. You cannot perform operations on it directly without using the special functions in mice
. If you want to extract that actual imputed datasets, you use complete
, the output of which is a data.frame with one row per individual per imputation (if using the "long"
format). If you are doing any analysis with your imputed data that cannot be performed within mice
, you need to create this dataset first.
来源:https://stackoverflow.com/questions/51370292/what-exactly-does-complete-in-mice-do