问题
I´m using the package MICE in R to do multiple imputations. I´ve done several imputations with only numerical variables, the imputation method is predictive mean matching, and when I use the command stripplot(name of imputed dataset) I get to see the observed and imputed values of all the variables.
The problem occurs when I try to do imputation on a combination of categorical and numerical variables. The imputation method then is PMM for the numerical variables, and logistical regression for the categorical ones. The stripplot-command only shows me the numerical variables. I tried to specify with these commands were edu is a categorical variable with 2 values:
stripplot(imp, imp$edu)
stripplot(imp, names(imp$edu))
And I got this error:
Error in stripplot.mids(imp, imp$edu) : Cannot pad extended formula.
Does anyone know how I can plot the values of the observed and the imputed values for both the numerical and the categorical variables?
回答1:
One thing you can try is to retrieve the imputed dataset
as a data.frame and just use normal plotting functions. First retrieve the datasets including the original dataset with missing values (imp is the mice.mids object i.e. result of running mice)
impL <- complete(imp,"long",include = T)
Next add a dummy indicating which datasets are imputed
impL$Imputed <- factor(impL$.imp >0,labels = c("Observed","Imputed"))
Then you can just use plotting functions for each variable. This has the benefit that you can create nicer plots. For example using ggplot
(package ggplot2) to create a barplot on a categorical variable:
ggplot(impL[which(!is.na(impL$var1)),],aes(x = var1)) +
geom_bar(aes(y = ..prop.., group = Imputed)) + facet_wrap(Imputed ~ .,ncol=1,nrow=2)
The !is.na
is included to avoid the plotting of an NA bar. var1
is the variable you want to plot. For a continuous variable you might create a density plot.
ggplot(impL, aes(x = var2, colour = Imputed)) + geom_density()
To look at all the unique imputations you can add group = .imp
within the aes brackets. Hope this helps
回答2:
I just had a similar issue, so I figured I might post an answer that achieves your goal without having to extract the imputed data.
library(mice)
# Create dataset holding numerical and categorical data
a <- as.factor(rbinom(100, 1, 0.5))
b <- rnorm(100, 5, 1)
df <- cbind.data.frame(a, b)
# Randomly assign 10 NA values to each column
df$a[sample(length(df$a), 10)] <- NA
df$b[sample(length(df$b), 10)] <- NA
# Impute with ppm and logreg
init = mice(df, maxit=0)
meth = init$method
meth['a'] <- 'logreg'
imp <- mice(df, method = meth)
# This only plots b, the numerical
stripplot(imp)
# This plots both, as included below
stripplot(imp, a + b ~ .imp)
来源:https://stackoverflow.com/questions/50565145/stripplot-in-mice