问题
I have a question about interaction terms in multiple imputations. My understanding is that the imputation model is supposed to include all information that is used in the later analysis including any transformations or interactions of variables (the Amelia user guide also makes this statement). But when I include the interaction term int=x1*x2
in the imputation, the imputed value for int
is not equal to x1*x2
. For example, when I have a binary variable x2
and a continuous variable x1
, int
should be zero when x2
is zero. That is not the case for the imputed values of int
. So how do I treat interactions in multiple imputations? Below is some example code illustrating the question.
library("Amelia")
n = 100
p.na = 0.1
n.na = ceiling(n*p.na)
set.seed(12345)
# create data
df = data.frame(
'x1' = rnorm(n),
'x2' = rbinom(n,1,0.5),
'int'= NA
)
df$x1[sample(1:100,n.na)]=NA
df$x1[sample(1:100,n.na)]=NA
df$int = with(df,x1*x2)
# impute
df.mi = amelia(df,m=2,noms=c("x2"))
# comparison
round(cbind(df,df.mi$imputations[[1]])[1:10,],2)
cbind(
'df' = with(df,int==x1*x2),
'df.mi' = with(df.mi$imputations[[1]],int==x1*x2))
And some of the output (row 6 is one of the cases discussed above for which int!=x1*x2
)
DF DF (imputed)
x1 x2 int x1 x2 int
1 0.59 1 0.59 0.59 1 0.59
2 0.71 1 0.71 0.71 1 0.71
3 -0.11 0 0.00 -0.11 0 0.00
4 -0.45 1 -0.45 -0.45 1 -0.45
5 0.61 1 0.61 0.61 1 0.61
6 NA 1 NA 0.24 1 0.48
7 0.63 0 0.00 0.63 0 0.00
8 -0.28 0 0.00 -0.28 0 0.00
9 -0.28 1 -0.28 -0.28 1 -0.28
10 -0.92 1 -0.92 -0.92 1 -0.92
回答1:
I think , in any cases you give the information to Amelia that int is the result of a transformation , x1*x2. So it treats it as a simple variable. But you can perform a Post-transformation in the imputed data like this:
df.mi = transform(df.mi, int = x2*x1)
Comparing to the original data you get this result:
mm <- cbind(df,df.mi$imputations$imp1)
mm[mm$x2==0 & is.na(mm$int),]
x1 x2 int x1 x2 int
45 NA 0 NA 0.3144084 0 0
49 NA 0 NA -1.1741704 0 0
76 NA 0 NA -0.2018450 0 0
EDIT I think I get better result using mice
package which :
"The algorithm imputes an incomplete column (the target column) by generating 'plausible' synthetic values given other columns in the data."
Using your data , I compare the original data.frame to all the imputed data sets when x2 is equal to 0.
library(mice)
rr <- mice(df)
mm1 <- cbind(df,do.call(cbind,lapply(1:5,function(i)complete(rr , i))))
mm1[mm1$x2==0 & is.na(mm1$int),]
x1 x2 int x1 x2 int x1 x2 int x1 x2 int x1 x2 int x1 x2 int
20 NA 0 NA 0.5168547 0 -0.162311 0.6203798 0 0.0000000 0.8881394 0 0.0000000 0.9371405 0 0.8248701 0.5855288 0 0.0000000
23 NA 0 NA 0.5168547 0 0.000000 0.4911883 0 0.0000000 -1.8323773 0 0.0000000 0.5855288 0 0.0000000 0.5855288 0 0.0000000
31 NA 0 NA 0.5168547 0 0.000000 0.1495920 0 -0.3240866 2.3305120 0 1.6324456 1.1207127 0 0.8544517 0.5674033 0 0.0000000
60 NA 0 NA 0.5365237 0 0.000000 0.2542712 0 0.0000000 1.5934885 0 0.9371405 0.7094660 0 0.5168547 0.2542712 0 -0.3079534
来源:https://stackoverflow.com/questions/17628481/interactions-terms-in-multiple-imputations-amelia-or-other-mi-packages