问题
How can I perform an operation (like subsetting or adding a calculated column) on each imputed dataset in an object of class mids
from R's package mice
? I would like the result to still be a mids
object.
Edit: Example
library(mice)
data(nhanes)
# create imputed datasets
imput = mice(nhanes)
The imputed datasets are stored as a list of lists
imput$imp
where there are rows only for the observations with imputation for the given variable.
The original (incomplete) dataset is stored here:
imput$data
For example, how would I create a new variable calculated as chl/2
in each of the imputed datasets, yielding a new mids
object?
回答1:
Another option is to calculate the variables before the imputation and place restrictions on them.
library(mice)
# Create the additional variable - this will have missing
nhanes$extra <- nhanes$chl / 2
# Change the method of imputation for extra, so that it always equals chl/2
# change the predictor matrix so only chl predicts extra
ini <- mice(nhanes, max = 0, print = FALSE)
meth <- ini$meth
meth["extra"] <- "~I(chl/2)"
pred <- ini$pred # extra isnt used to predict
pred[ "extra", "chl"] <- 1
# Imputations
imput <- mice(nhanes, seed=1, pred = pred, meth = meth, print = FALSE)
There are examples in mice: Multivariate Imputation by Chained Equations in R
回答2:
This can be done easily as follows -
Use complete()
to convert a mids object to a long-format data.frame:
long1 <- complete(midsobj1, action='long', include=TRUE)
Perform whatever manipulations needed:
long1$new.var <- long1$chl/2
long2 <- subset(long1, age >= 5)
use as.mids()
to convert back manipulated data to mids object:
midsobj2 <- as.mids(long2)
Now you can use midsobj2
as required. Note that the include=TRUE
(used to include the original data with missing values) is needed for as.mids()
to compress the long-formatted data properly. Note that prior to mice v2.25 there was a bug in the as.mids() function (see this post https://stats.stackexchange.com/a/158327/69413)
EDIT: According to this answer https://stackoverflow.com/a/34859264/4269699 (from what is essentially a duplicate question) you can also edit the mids object directly by accessing $data and $imp. So for example
midsobj2<-midsobj1
midsobj2$data$new.var <- midsobj2$data$chl/2
midsobj2$imp$new.var <- midsobj2$imp$chl/2
You will run into trouble though if you want to subset $imp or if you want to use $call, so I wouldn't recommend this solution in general.
回答3:
There is an overload of with
that can help you here
with(imput, chl/2)
the documentation is given at ?with.mids
来源:https://stackoverflow.com/questions/26667162/perform-operation-on-each-imputed-dataset-in-rs-mice