replace NA value with the group value

后端 未结 5 697
栀梦
栀梦 2020-11-27 22:15

I have a df as follows which has 20 people across 5 households. Some people within the household have missing data for whether they have a med_card or not. I want to give th

相关标签:
5条回答
  • 2020-11-27 22:18

    Try ave. It applies a function to groups. Have a look at ?ave for details, e.g.:

    df$med_card_new <- ave(df$med_card, df$hhold_no, FUN=function(x)unique(x[!is.na(x)]))
    
    #   person_id hhold_no med_card med_card_new
    #1          1        1        1            1
    #2          2        1        1            1
    #3          3        1       NA            1
    #4          4        1       NA            1
    #5          5        1       NA            1
    #6          6        2        0            0
    #7          7        2        0            0
    #8          8        2        0            0
    #9          9        2        0            0
    

    Please note that this will only work if not all values in a household are NA and the should not differ (e.g. person 1 == 1, person 2 == 0).

    0 讨论(0)
  • 2020-11-27 22:30

    data.table solution

    library(data.table)
    setDT(df)[, med_card2 := unique(med_card[!is.na(med_card)]), by = hhold_no]
    
    #     person_id hhold_no med_card med_card2
    #  1:         1        1        1         1
    #  2:         2        1        1         1
    #  3:         3        1       NA         1
    #  4:         4        1       NA         1
    #  5:         5        1       NA         1
    #  6:         6        2        0         0
    #  7:         7        2        0         0
    #  8:         8        2        0         0
    #  9:         9        2        0         0
    # 10:        10        3       NA         1
    # 11:        11        3       NA         1
    # 12:        12        3       NA         1
    # 13:        13        3        1         1
    # 14:        14        3        1         1
    # 15:        15        4        1         1
    # 16:        16        4        1         1
    # 17:        17        5        1         1
    # 18:        18        5        1         1
    # 19:        19        5       NA         1
    # 20:        20        5       NA         1
    
    0 讨论(0)
  • 2020-11-27 22:30

    This is late, but if are working on a numeric column, try this:

    require(data.table)
    
    setDT(df)[,'record_year':=mean( med_card,na.rm = T),by = c('hhold_no')]
    
    0 讨论(0)
  • 2020-11-27 22:31

    That is exactly what na.aggregate (link) in the zoo package does:

    library(zoo)
    
    transform(df, med_card_new = na.aggregate(med_card, by = hhold_no))
    

    This uses mean; however, you can specify any function you like. For example, if you prefer to return an NA if all items in a group are NA (rather than NaN which is what mean would return if given a zero length vector) then

    meanNA <- function(x, ...) if (all(is.na(x))) NA else mean(x, ...)
    transform(df, med_card_new = na.aggregate(med_card, by = hhold_no, FUN = meanNA))
    
    0 讨论(0)
  • 2020-11-27 22:35

    Using dplyr you could also group_by() and then take advantage of a function such as max with an na.rm argument to return all numerics for each group.

    library(dplyr)
    df %>% group_by(hhold_no) %>% mutate(med_card_new = max(med_card, na.rm = T))
    

    Given that non-missings in a group are numeric and constant, you could also use mean or min instead of max.

    0 讨论(0)
提交回复
热议问题