compute means of a group by factor

前端未结

关注

 4  598

Is there a way that this can be improved, or done more simply?

means.by<-function(data,INDEX){
  b<-by(data,INDEX,function(d)apply(d,2,mean))
  return(


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  悲&欢浪女        
                
              
                            
                2021-01-02 07:24
              
            
            
                                                                       
You want tapply or ave, depending on how you want your output:

> Data <- data.frame(grp=sample(letters[1:3],20,TRUE),x=rnorm(20))
> ave(Data$x, Data$grp)
 [1] -0.3258590 -0.5009832 -0.5009832 -0.2136670 -0.3258590 -0.5009832
 [7] -0.3258590 -0.2136670 -0.3258590 -0.2136670 -0.3258590 -0.3258590
[13] -0.3258590 -0.5009832 -0.2136670 -0.5009832 -0.3258590 -0.2136670
[19] -0.5009832 -0.2136670
> tapply(Data$x, Data$grp, mean)
         a          b          c 
-0.5009832 -0.2136670 -0.3258590 

# Example with more than one column:
> Data <- data.frame(grp=sample(letters[1:3],20,TRUE),x=rnorm(20),y=runif(20))
> do.call(rbind,lapply(split(Data[,-1], Data[,1]), mean))
             x         y
a -0.675195494 0.4772696
b  0.270891403 0.5091359
c  0.002756666 0.4053922

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  暗喜        
                
              
                            
                2021-01-02 07:25
              
            
            
                                                                       
Use only the generic function in R. 

>d=data.frame(type=as.factor(rep(c("A","B","C"),each=3)),
x=rnorm(9),y=rgamma(9,2,1))
> d
type           x         y
1    A -1.18077326 3.1428680
2    A -0.91930418 4.4606603
3    A  0.88345422 1.0979301
4    B  0.06964133 1.1429911
5    B -1.15380345 2.7609049
6    B  1.13637202 0.6668986
7    C -1.12052765 1.7352306
8    C -1.34803630 2.3099202
9    C -2.23135374 0.7244689
>
> cbind(lm(x~-1+type,data=d)$coef,lm(y~-1+type,data=d)$coef)
         [,1]     [,2]
typeA -0.4055411 2.900486
typeB  0.0174033 1.523598
typeC -1.5666392 1.589873

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  再見小時候        
                
              
                            
                2021-01-02 07:33
              
            
            
                                                                       
Does the aggregate function do what you want?

If not, look at the plyr package, it gives several options for taking things apart, doing computations on the pieces, then putting it back together again.

You may also be able to do this using the reshape package.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  挽巷        
                
              
                            
                2021-01-02 07:34
              
            
            
                                                                       
With plyr

library(plyr)
df <- ddply(x, .(id),function(x) data.frame(
mean=mean(x$var)
))
print(df)


Update: 

data<-data.frame(I=as.factor(rep(letters[1:10],each=3)),x=rnorm(30),y=rbinom(30,5,.5))
ddply(data,.(I), function(x) data.frame(x=mean(x$x), y=mean(x$y)))


See, plyr is smart :) 

Update 2: 

In response to your comment, I believe cast and melt from the reshape package are much simpler for your purpose. 

cast(melt(data),I ~ variable, mean)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复