I have a dataframe that has a scattering of NA\'s
toy_df
# Y X1 X2 Label
# 5 3 3 A
# 3 NA 2 B
# 3 NA NA C
# 2 NA 6 B
I want to gr
We can use data.table
. Convert the 'data.frame' to 'data.table' (setDT(toy_df)
), grouped by 'Label', loop through the Subset of Data.table (.SD
) and get the sum
of non-NA values (!is.na(x)
)
library(data.table)
setDT(toy_df)[, lapply(.SD, function(x) sum(!is.na(x))), by = Label]
# Label Y X1 X2
#1: A 1 1 1
#2: B 2 0 2
#3: C 1 0 0
Or with dplyr
using the same methodology
library(dplyr)
toy_df %>%
group_by(Label) %>%
summarise_each(funs(sum(!is.na(.))))
Or a base R
option with by
and colSums
grouped by the 4th column on logical matrix (!is.na(toy_df[-4])
)
by(!is.na(toy_df[-4]), toy_df[4], FUN = colSums)
Or with rowsum
with similar approach as in by
except using the rowsum
function.
rowsum(+(!is.na(toy_df[-4])), group=toy_df[,4])
# Y X1 X2
#A 1 1 1
#B 2 0 2
#C 1 0 0
Or in base R
aggregate(toy_df[,1:3], by=list(toy_df$Label), FUN=function(x) { sum(!is.na(x))})
aggregate(cbind(toy_df$Y, toy_df$X1, toy_df$X2), list(toy_df$label),
FUN = function (x) sum(!is.na(x)))