I have a categorical variable with three levels (A
, B
, and C
).
I also have a continuous variable with some missing values on it.
I would like to replace the NA
values with the mean of its group. This is, missing observations from group A
has to be replaced with the mean of group A
.
I know I can just calculate each group's mean and replace missing values, but I'm sure there's another way to do so more efficiently with loops.
A <- subset(data, group == "A")
mean(A$variable, rm.na = TRUE)
A$variable[which(is.na(A$variable))] <- mean(A$variable, na.rm = TRUE)
Now, I understand I could do the same for group B
and C
, but perhaps a for
loop (with if
and else
) might do the trick?
require(dplyr)
data %>% group_by(group) %>%
mutate(variable=ifelse(is.na(variable),mean(variable,na.rm=TRUE),variable))
For a faster, base-R version, you can use ave
:
data$variable<-ave(data$variable,data$group,FUN=function(x)
ifelse(is.na(x), mean(x,na.rm=TRUE), x))
You could use data.table
package to achieve this-
tomean <- c("var1", "var2")
library(data.table)
setDT(dat)
dat[, (tomean) := lapply(tomean, function(x) {
x <- get(x)
x[is.na(x)] <- mean(x, na.rm = TRUE)
x
})]
来源:https://stackoverflow.com/questions/55345593/impute-missing-data-with-mean-by-group