Replace NA values with median by group

后端 未结 2 1342
离开以前
离开以前 2021-01-28 17:51

I have used the below tapply function to get the median of Age based on Pclass.

Now how can I impute those median values to NA values based on Pclass?

2条回答
  •  夕颜
    夕颜 (楼主)
    2021-01-28 18:10

    Here is another base R approach that uses replace and ave.

    df1 <- transform(df1,
                     Age = ave(Age, Pclass, FUN = function(x) replace(x, is.na(x), median(x, na.rm = T))))
    df1
    #   Pclass Age
    # 1      A   1
    # 2      A   2
    # 3      A   3
    # 4      B   4
    # 5      B   5
    # 6      B   6
    # 7      C   7
    # 8      C   8
    # 9      C   9
    

    Same idea but using data.table

    library(data.table)
    setDT(df1)
    df1[, Age := as.integer(replace(Age, is.na(Age), median(Age, na.rm = T))), by = Pclass]
    df1
    

    data

    df1 <- data.frame(Pclass = rep(LETTERS[1:3], each = 3),
                      Age = 1:9)
    df1$Age[c(FALSE, TRUE, FALSE)] <- NA
    df1
    #   Pclass Age
    # 1      A   1
    # 2      A  NA
    # 3      A   3
    # 4      B   4
    # 5      B  NA
    # 6      B   6
    # 7      C   7
    # 8      C  NA
    # 9      C   9
    

提交回复
热议问题