interactions terms in multiple imputations (Amelia or other mi packages)

问题

I have a question about interaction terms in multiple imputations. My understanding is that the imputation model is supposed to include all information that is used in the later analysis including any transformations or interactions of variables (the Amelia user guide also makes this statement). But when I include the interaction term int=x1*x2 in the imputation, the imputed value for int is not equal to x1*x2. For example, when I have a binary variable x2 and a continuous variable x1, int should be zero when x2 is zero. That is not the case for the imputed values of int. So how do I treat interactions in multiple imputations? Below is some example code illustrating the question.

library("Amelia")

n = 100
p.na = 0.1
n.na = ceiling(n*p.na)
set.seed(12345)
# create data
df = data.frame(
    'x1' = rnorm(n),
    'x2' = rbinom(n,1,0.5),
    'int'= NA
)
df$x1[sample(1:100,n.na)]=NA
df$x1[sample(1:100,n.na)]=NA
df$int = with(df,x1*x2)
# impute
df.mi = amelia(df,m=2,noms=c("x2"))

# comparison
round(cbind(df,df.mi$imputations[[1]])[1:10,],2)
cbind(
    'df' = with(df,int==x1*x2),
    'df.mi' = with(df.mi$imputations[[1]],int==x1*x2))

And some of the output (row 6 is one of the cases discussed above for which int!=x1*x2)

      DF           DF (imputed)
      x1 x2   int    x1 x2   int
1   0.59  1  0.59  0.59  1  0.59
2   0.71  1  0.71  0.71  1  0.71
3  -0.11  0  0.00 -0.11  0  0.00
4  -0.45  1 -0.45 -0.45  1 -0.45
5   0.61  1  0.61  0.61  1  0.61
6     NA  1    NA  0.24  1  0.48
7   0.63  0  0.00  0.63  0  0.00
8  -0.28  0  0.00 -0.28  0  0.00
9  -0.28  1 -0.28 -0.28  1 -0.28
10 -0.92  1 -0.92 -0.92  1 -0.92

回答1:

I think , in any cases you give the information to Amelia that int is the result of a transformation , x1*x2. So it treats it as a simple variable. But you can perform a Post-transformation in the imputed data like this:

   df.mi = transform(df.mi, int = x2*x1)

Comparing to the original data you get this result:

mm <- cbind(df,df.mi$imputations$imp1)
mm[mm$x2==0 & is.na(mm$int),]
   x1 x2 int         x1 x2 int
45 NA  0  NA  0.3144084  0   0
49 NA  0  NA -1.1741704  0   0
76 NA  0  NA -0.2018450  0   0

EDIT I think I get better result using mice package which :

"The algorithm imputes an incomplete column (the target column) by generating 'plausible' synthetic values given other columns in the data."

Using your data , I compare the original data.frame to all the imputed data sets when x2 is equal to 0.

library(mice)
rr <- mice(df)
mm1 <- cbind(df,do.call(cbind,lapply(1:5,function(i)complete(rr , i))))
mm1[mm1$x2==0 & is.na(mm1$int),]

  x1 x2 int        x1 x2       int        x1 x2        int         x1 x2       int        x1 x2       int        x1 x2        int
20 NA  0  NA 0.5168547  0 -0.162311 0.6203798  0  0.0000000  0.8881394  0 0.0000000 0.9371405  0 0.8248701 0.5855288  0  0.0000000
23 NA  0  NA 0.5168547  0  0.000000 0.4911883  0  0.0000000 -1.8323773  0 0.0000000 0.5855288  0 0.0000000 0.5855288  0  0.0000000
31 NA  0  NA 0.5168547  0  0.000000 0.1495920  0 -0.3240866  2.3305120  0 1.6324456 1.1207127  0 0.8544517 0.5674033  0  0.0000000
60 NA  0  NA 0.5365237  0  0.000000 0.2542712  0  0.0000000  1.5934885  0 0.9371405 0.7094660  0 0.5168547 0.2542712  0 -0.3079534

来源：https://stackoverflow.com/questions/17628481/interactions-terms-in-multiple-imputations-amelia-or-other-mi-packages

标签

missing-data