R aggregate data by defining grouping

前端未结

关注

 1  800

I am having trouble grouping and summing the follwing data in R:

category freq
1    C1     9
2    C2    39
3    C3     3
4    A1    38
5    A2     2
6    A3


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  执念已碎        
                
              
                            
                2021-01-26 01:38
              
            
            
                                                                       
Why don't you add a column to your dataframe, that would be the letter part of your "Category" column. Then, you could use ddply.

Example:

 df = data.frame(id = c(1,2,3,4,5), category = c("AB1", "AB2", "B1", "B2", "B3"), freq = c(50,51,2,26))
 df$new = as.factor(gsub("\\d", "", df$category))


You could then use ddply based on the new column, as follows:

 library(plyr)
 aggregate <- ddply(df, .(new), summarize, freq = sum(freq))


You get the following result:

#  new freq
#1  AB  101
#2   B   31


This would work only if you intend to group all the categories with similar "alphabetical" substring under the same umbrella category.

If, HOWEVER, you wish to group custom categories under one category, (your example: KG, XM and L4 would be part of the same category), you could define new "super" categories, and assign each sub-category to the appropriate "super" category. One way that I can think of is the switch function. Please see example below:

 df = data.frame(id = c(1,2,3,4,5), category = c("A", "B", "KG", "XM", "L4"), freq = c(50,51,3,2,26))

 fct <- function(cat) {switch(cat, "A" = "CAT1", "B" = "CAT2", "KG" = "CAT3", "XM" = "CAT3", "L4"="CAT3")}
 df$new = as.factor(unlist(lapply(df$category, fct)))

 aggregate <- ddply(df, .(new), summarize, freq = sum(freq))


This will give you:

 #   new freq
 #1 CAT1   50
 #2 CAT2   51
 #3 CAT3   31

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复