Aggregate by NA in R

后端 未结 5 1882
闹比i
闹比i 2021-01-13 11:22

Does anybody know how to aggregate by NA in R.

If you take the example below

a <- matrix(1,5,2)
a[1:2,2] <- NA
a[3:5,2] <- 2
aggregate(a[,1]         


        
相关标签:
5条回答
  • 2021-01-13 11:58

    Instead of aggregate(), you may want to consider rowsum(). It is actually designed for this exact operation on matrices and is known to be much faster than aggregate(). We can add NA to the factor levels of a[, 2] with addNA(). This will assure that NA shows up as a grouping variable.

    rowsum(a[, 1], addNA(a[, 2]))
    #      [,1]
    # 2       3
    # <NA>    2
    

    If you still want to use aggregate(), you can incorporate addNA() as well.

    aggregate(a[, 1], list(Group = addNA(a[, 2])), sum)
    #   Group x
    # 1     2 3
    # 2  <NA> 2
    

    And one more option with data.table -

    library(data.table)
    as.data.table(a)[, .(x = sum(V1)), by = .(Group = V2)]
    #    Group x
    # 1:    NA 2
    # 2:     2 3
    
    0 讨论(0)
  • 2021-01-13 11:58

    Use summarize from dplyr

    library(dplyr)
    
    a %>%
      as.data.frame %>%
      group_by(V2) %>%
      summarize(V1_sum = sum(V1))
    
    0 讨论(0)
  • 2021-01-13 12:01

    Using sqldf:

    a <- as.data.frame(a)
    sqldf("SELECT V2 [Group], SUM(V1) x 
          FROM a 
          GROUP BY V2")
    

    Output:

      Group x
    1    NA 2
    2     2 3
    

    stats package

    A variation of AdamO's proposal:

    data.frame(xtabs( V1 ~ V2 , data = a,na.action = na.pass, exclude = NULL))
    

    Output:

        V2 Freq
    1    2    3
    2 <NA>    2
    
    0 讨论(0)
  • 2021-01-13 12:01

    You can also try aggregating by is.na(a[,2]) instead.

    aggregate(a[,1], by=list(is.na(a[,2])), sum)
    
    #   Group.1 x
    # 1   FALSE 3
    # 2    TRUE 2
    

    If you want a finer distinction than just NA or not, then you may want to define a new variable that uses an previously unused value to denote NA (a factor would be more elegant, but a numeric vector is the simplest):

    b <- a[,2]
    b[is.na(b)] <- 999
    aggregate(a[,1], by=list(b), sum)
    
    #   Group.1 x
    # 1       2 3
    # 2     999 2
    
    0 讨论(0)
  • 2021-01-13 12:18

    The addNA solution of Rich doesn't require any substantial change to the aggregate syntax, so I think it's the best solution. I'll point out that another option, which produces output similar to table (and thus can be coerced into a data.frame structure similar to that of aggregate) is xtabs.

    xtabs(a[, 1] ~ a[, 2], addNA=T)
    

    Gives:

      Group.1 x
    1       2 3
    2    <NA> 2
    

    Another "trick" I see is assigning a missing code to these data. We all like the NA output of R, but assigning a missing code to a grouping variable is a good coding exercise. We take it so that it has one more digit than the largest value in the dataset and is of the form -999...99.

    codemiss <- function(x) -10^(floor(log(max(abs(x), na.rm=T), base=10))+2)-1

    works in general.

    Then you get

    a[, 2][is.na(a[, 2])] <- codemiss(a[, 2])

    And:

    aggregate(a[, 1], list(a[, 2]), sum)

    Gives you:

      Group.1 x
    1     -99 2
    2       2 3
    
    0 讨论(0)
提交回复
热议问题