Aggregate by NA in R

后端未结

关注

 5  1906

Does anybody know how to aggregate by NA in R.

If you take the example below

a <- matrix(1,5,2)
a[1:2,2] <- NA
a[3:5,2] <- 2
aggregate(a[,1]


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  迷失自我        
                
              
                            
                2021-01-13 11:58
              
            
            
                                                                       
Instead of aggregate(), you may want to consider rowsum().  It is actually designed for this exact operation on matrices and is known to be much faster than aggregate().  We can add NA to the factor levels of a[, 2] with addNA().  This will assure that NA shows up as a grouping variable.

rowsum(a[, 1], addNA(a[, 2]))
#      [,1]
# 2       3
# <NA>    2


If you still want to use aggregate(), you can incorporate addNA() as well.

aggregate(a[, 1], list(Group = addNA(a[, 2])), sum)
#   Group x
# 1     2 3
# 2  <NA> 2


And one more option with data.table -

library(data.table)
as.data.table(a)[, .(x = sum(V1)), by = .(Group = V2)]
#    Group x
# 1:    NA 2
# 2:     2 3

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  轮回少年        
                
              
                            
                2021-01-13 11:58
              
            
            
                                                                       
Use summarize from dplyr

library(dplyr)

a %>%
  as.data.frame %>%
  group_by(V2) %>%
  summarize(V1_sum = sum(V1))

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  难免孤独        
                
              
                            
                2021-01-13 12:01
              
            
            
                                                                       

  Using sqldf:


a <- as.data.frame(a)
sqldf("SELECT V2 [Group], SUM(V1) x 
      FROM a 
      GROUP BY V2")


Output:

  Group x
1    NA 2
2     2 3



  stats package


A variation of AdamO's proposal:

data.frame(xtabs( V1 ~ V2 , data = a,na.action = na.pass, exclude = NULL))


Output:

    V2 Freq
1    2    3
2 <NA>    2

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  轻奢々        
                
              
                            
                2021-01-13 12:01
              
            
            
                                                                       
You can also try aggregating by is.na(a[,2]) instead.

aggregate(a[,1], by=list(is.na(a[,2])), sum)

#   Group.1 x
# 1   FALSE 3
# 2    TRUE 2


If you want a finer distinction than just NA or not, then you may want to define a new variable that uses an previously unused value to denote NA (a factor would be more elegant, but a numeric vector is the simplest):

b <- a[,2]
b[is.na(b)] <- 999
aggregate(a[,1], by=list(b), sum)

#   Group.1 x
# 1       2 3
# 2     999 2

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  说谎        
                
              
                            
                2021-01-13 12:18
              
            
            
                                                                       
The addNA solution of Rich doesn't require any substantial change to the aggregate syntax, so I think it's the best solution. I'll point out that another option, which produces output similar to table (and thus can be coerced into a data.frame structure similar to that of aggregate) is xtabs. 

xtabs(a[, 1] ~ a[, 2], addNA=T)


Gives:

  Group.1 x
1       2 3
2    <NA> 2


Another "trick" I see is assigning a missing code to these data. We all like the NA output of R, but assigning a missing code to a grouping variable is a good coding exercise. We take it so that it has one more digit than the largest value in the dataset and is of the form -999...99.

codemiss <- function(x) -10^(floor(log(max(abs(x), na.rm=T), base=10))+2)-1

works in general.

Then you get 

a[, 2][is.na(a[, 2])] <- codemiss(a[, 2])

And:

aggregate(a[, 1], list(a[, 2]), sum)

Gives you:

  Group.1 x
1     -99 2
2       2 3

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复