问题
I am working with survey data that use probability weights and multiple imputations. I would like to get marginal effects after estimating a logit model using the imputed data sets and the survey weights. I cannot figure out how to do this in R. Stata has the package mimrgns which makes it pretty easy. There is also this article (pdf) and supplementary material (pdf) that gives some direction, but I can't seem to apply it to my situation.
In the following example, please assume I already imputed "income" across three data sets (i.e., df1, df2, and df3). I would like to predict "gender" using employment status (i.e., working) and "income."
Here is a reproducible example.
library(tibble)
library(survey)
library(mitools)
library(ggeffects)
# Data set 1
# Note that I am excluding the "income" variable from the "df"s and creating
# it separately so that it varies between the data sets. This simulates the
# variation with multiple imputation. Since I am using the same seed
# (i.e., 123), all the other variables will be the same, the only one that
# will vary will be "income."
set.seed(123)
df1 <- tibble(id = seq(1, 100, by = 1),
gender = as.factor(rbinom(n = 100, size = 1, prob = 0.50)),
working = as.factor(rbinom(n = 100, size = 1, prob = 0.40)),
pweight = sample(50:500, 100, replace = TRUE))
# Create random income variable.
set.seed(456)
income <- tibble(income = sample(0:100000, 100))
# Bind it to df1
df1 <- cbind(df1, income)
# Data set 2
set.seed(123)
df2 <- tibble(id = seq(1, 100, by = 1),
gender = as.factor(rbinom(n = 100, size = 1, prob = 0.50)),
working = as.factor(rbinom(n = 100, size = 1, prob = 0.40)),
pweight = sample(50:500, 100, replace = TRUE))
set.seed(789)
income <- tibble(income = sample(0:100000, 100))
df2 <- cbind(df2, income)
# Data set 3
set.seed(123)
df3 <- tibble(id = seq(1, 100, by = 1),
gender = as.factor(rbinom(n = 100, size = 1, prob = 0.50)),
working = as.factor(rbinom(n = 100, size = 1, prob = 0.40)),
pweight = sample(50:500, 100, replace = TRUE))
set.seed(101)
income <- tibble(income = sample(0:100000, 100))
df3 <- cbind(df3, income)
# Apply weights via svydesign
imputation <- svydesign(id = ~id,
weights = ~pweight,
data = imputationList(list(df1,
df2,
df3)))
# Logit model with weights and imputations
logitImp <- with(imputation, svyglm(gender ~ working + income,
family = binomial()))
# Combine results across MI datasets
summary(MIcombine(logitImp))
Normally I would use library(ggeffects)
to get marginal effects, but I get the following error when I try with the imputed data Error in class(model) <- "lmerMod" : attempt to set an attribute on NULL
. Here is an example of how I would do it without the imputation, using "df1" as the data set.
# Create new svydesign variable
noImp <- svydesign(id = ~id,
weights = ~pweight,
data = df1)
# Run model
logit <- svyglm(gender ~ working + income,
family = binomial,
design = noImp,
data = df1)
# Get marginal effects at the mean
ggpredict(logit, term = "working")
Any idea how to do this with with multiple imputation?
来源:https://stackoverflow.com/questions/48506315/marginal-effects-with-survey-weights-and-multiple-imputations