Function to count NA values at each level of a factor

心不动则不痛 提交于 2019-11-29 08:35:13

Looking for something like this...???

          FUN=function(x) sum(,
  age sex  size length width height
1  ad   f small      3     4      4
2 juv   m large      5     4      4

Use aggregate:

nacheck <- function(var, factor)
    aggregate(var, list(factor), function(x) sum(

nacheck(data$length, data$age)
nacheck(data$length, data$sex)
nacheck(data$length, data$size)

You could also apply this to your dataframe, by each factor to get NA counts for all of the dimension measures for each factor.

apply(data[,c("length","width","height")], 2, nacheck, factor=data$age)
apply(data[,c("length","width","height")], 2, nacheck, factor=data$sex)
apply(data[,c("length","width","height")], 2, nacheck, factor=data$size)

To do this all as one function, nest nacheck in something and then lapply:

exploreNA <- function(df, factors){
    nacheck <- function(var, factor)
        aggregate(var, list(factor), function(x) sum(
    lapply(factors, function(x) apply(df, 2, nacheck, factor=x))

exploreNA(data[,c("length","width","height")], list(data$age, data$sex, data$size))

A data.table approach:

DT <- data.table(data)
DT[, lapply(.SD, function(x) sum( , by = list(age,sex,size)]
##    age sex  size length width height
## 1: juv   m large      5     4      4
## 2:  ad   f small      3     4      4

and the plyr equivalent using colwise and ddply

ddply(data, .(age,sex,size), colwise(.fun = function(x) sum(
##   age sex  size length width height
## 1  ad   f small      3     4      4
## 2 juv   m large      5     4      4

You could always use a vector of column names for the by components

by.cols <- c('age', 'sex' ,'size')
# then the following will work....
DT[, lapply(.SD, function(x) sum(, by = by.cols]
ddply(data, by.cols, colwise(.fun = function(x) sum(