I have a df as follows which has 20 people across 5 households. Some people within the household have missing data for whether they have a med_card or not. I want to give th
Try ave
. It applies a function to groups. Have a look at ?ave
for details, e.g.:
df$med_card_new <- ave(df$med_card, df$hhold_no, FUN=function(x)unique(x[!is.na(x)]))
# person_id hhold_no med_card med_card_new
#1 1 1 1 1
#2 2 1 1 1
#3 3 1 NA 1
#4 4 1 NA 1
#5 5 1 NA 1
#6 6 2 0 0
#7 7 2 0 0
#8 8 2 0 0
#9 9 2 0 0
Please note that this will only work if not all values in a household are NA
and the should not differ (e.g. person 1 == 1, person 2 == 0).
data.table
solution
library(data.table)
setDT(df)[, med_card2 := unique(med_card[!is.na(med_card)]), by = hhold_no]
# person_id hhold_no med_card med_card2
# 1: 1 1 1 1
# 2: 2 1 1 1
# 3: 3 1 NA 1
# 4: 4 1 NA 1
# 5: 5 1 NA 1
# 6: 6 2 0 0
# 7: 7 2 0 0
# 8: 8 2 0 0
# 9: 9 2 0 0
# 10: 10 3 NA 1
# 11: 11 3 NA 1
# 12: 12 3 NA 1
# 13: 13 3 1 1
# 14: 14 3 1 1
# 15: 15 4 1 1
# 16: 16 4 1 1
# 17: 17 5 1 1
# 18: 18 5 1 1
# 19: 19 5 NA 1
# 20: 20 5 NA 1
This is late, but if are working on a numeric column, try this:
require(data.table)
setDT(df)[,'record_year':=mean( med_card,na.rm = T),by = c('hhold_no')]
That is exactly what na.aggregate
(link) in the zoo package does:
library(zoo)
transform(df, med_card_new = na.aggregate(med_card, by = hhold_no))
This uses mean
; however, you can specify any function you like. For example, if you prefer to return an NA if all items in a group are NA (rather than NaN which is what mean
would return if given a zero length vector) then
meanNA <- function(x, ...) if (all(is.na(x))) NA else mean(x, ...)
transform(df, med_card_new = na.aggregate(med_card, by = hhold_no, FUN = meanNA))
Using dplyr
you could also group_by()
and then take advantage of a function such as max
with an na.rm
argument to return all numerics for each group.
library(dplyr)
df %>% group_by(hhold_no) %>% mutate(med_card_new = max(med_card, na.rm = T))
Given that non-missings in a group are numeric and constant, you could also use mean
or min
instead of max
.