R group by, counting non-NA values

后端 未结 3 1174
刺人心
刺人心 2020-12-11 07:32

I have a dataframe that has a scattering of NA\'s

toy_df
# Y  X1 X2 Label
# 5  3  3  A
# 3  NA 2  B
# 3  NA NA C
# 2  NA 6  B

I want to gr

3条回答
  •  囚心锁ツ
    2020-12-11 07:51

    We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(toy_df)), grouped by 'Label', loop through the Subset of Data.table (.SD) and get the sum of non-NA values (!is.na(x))

    library(data.table)
    setDT(toy_df)[, lapply(.SD, function(x) sum(!is.na(x))), by = Label]
    #   Label Y X1 X2
    #1:     A 1  1  1
    #2:     B 2  0  2
    #3:     C 1  0  0
    

    Or with dplyr using the same methodology

    library(dplyr)
    toy_df %>% 
          group_by(Label) %>%
          summarise_each(funs(sum(!is.na(.))))
    

    Or a base R option with by and colSums grouped by the 4th column on logical matrix (!is.na(toy_df[-4]))

    by(!is.na(toy_df[-4]), toy_df[4], FUN = colSums)
    

    Or with rowsum with similar approach as in by except using the rowsum function.

    rowsum(+(!is.na(toy_df[-4])), group=toy_df[,4])
    #  Y X1 X2
    #A 1  1  1
    #B 2  0  2
    #C 1  0  0
    

提交回复
热议问题