Replace NA values with median by group

╄→гoц情女王★ 提交于 2021-02-05 11:57:42

问题


I have used the below tapply function to get the median of Age based on Pclass.

Now how can I impute those median values to NA values based on Pclass?

tapply(titan_train$Age, titan_train$Pclass, median, na.rm=T)

Desired output


回答1:


Here is another base R approach that uses replace and ave.

df1 <- transform(df1,
                 Age = ave(Age, Pclass, FUN = function(x) replace(x, is.na(x), median(x, na.rm = T))))
df1
#   Pclass Age
# 1      A   1
# 2      A   2
# 3      A   3
# 4      B   4
# 5      B   5
# 6      B   6
# 7      C   7
# 8      C   8
# 9      C   9

Same idea but using data.table

library(data.table)
setDT(df1)
df1[, Age := as.integer(replace(Age, is.na(Age), median(Age, na.rm = T))), by = Pclass]
df1

data

df1 <- data.frame(Pclass = rep(LETTERS[1:3], each = 3),
                  Age = 1:9)
df1$Age[c(FALSE, TRUE, FALSE)] <- NA
df1
#   Pclass Age
# 1      A   1
# 2      A  NA
# 3      A   3
# 4      B   4
# 5      B  NA
# 6      B   6
# 7      C   7
# 8      C  NA
# 9      C   9



回答2:


Try the following.

set.seed(1)
df1 <- data.frame(Pclass = sample(1:3, 20, TRUE),
                  Age = sample(c(NA, 20:40), 20, TRUE, prob = c(10, rep(1, 21))))

new <- ave(df1$Age, df1$Pclass, FUN = function(x) median(x, na.rm = TRUE))
df1$Age[is.na(df1$Age)] <- new[is.na(df1$Age)]

Final clean up.

rm(new)


来源:https://stackoverflow.com/questions/53570721/replace-na-values-with-median-by-group

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!