R group by, counting non-NA values

后端 未结 3 1175
刺人心
刺人心 2020-12-11 07:32

I have a dataframe that has a scattering of NA\'s

toy_df
# Y  X1 X2 Label
# 5  3  3  A
# 3  NA 2  B
# 3  NA NA C
# 2  NA 6  B

I want to gr

相关标签:
3条回答
  • 2020-12-11 07:51

    We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(toy_df)), grouped by 'Label', loop through the Subset of Data.table (.SD) and get the sum of non-NA values (!is.na(x))

    library(data.table)
    setDT(toy_df)[, lapply(.SD, function(x) sum(!is.na(x))), by = Label]
    #   Label Y X1 X2
    #1:     A 1  1  1
    #2:     B 2  0  2
    #3:     C 1  0  0
    

    Or with dplyr using the same methodology

    library(dplyr)
    toy_df %>% 
          group_by(Label) %>%
          summarise_each(funs(sum(!is.na(.))))
    

    Or a base R option with by and colSums grouped by the 4th column on logical matrix (!is.na(toy_df[-4]))

    by(!is.na(toy_df[-4]), toy_df[4], FUN = colSums)
    

    Or with rowsum with similar approach as in by except using the rowsum function.

    rowsum(+(!is.na(toy_df[-4])), group=toy_df[,4])
    #  Y X1 X2
    #A 1  1  1
    #B 2  0  2
    #C 1  0  0
    
    0 讨论(0)
  • 2020-12-11 07:52

    Or in base R

    aggregate(toy_df[,1:3], by=list(toy_df$Label), FUN=function(x) { sum(!is.na(x))})
    
    0 讨论(0)
  • 2020-12-11 08:08
    aggregate(cbind(toy_df$Y, toy_df$X1, toy_df$X2), list(toy_df$label),
              FUN = function (x) sum(!is.na(x)))
    
    0 讨论(0)
提交回复
热议问题