Python Pandas average based on condition into new column

前端未结

关注

 3  1084

猫巷女王i 2021-02-09 16:49

I have a pandas dataframe containing the following data:

matchID    server    court    speed
1          1         A         100
1          2         D         20


      
      
        
          3条回答        

        
                    
            
            
                         
                
              
              
                
                   忘了有多久
                                             
                
                
                (楼主)
            
              
              
                2021-02-09 17:51
              

            
            
                        
With groupby, we can still use loc to select the intended parts that we want to replace but put the whole computation within a for loop from df.groupby("matchID").

for id, subg in df.groupby("matchID"):       
    df.loc[df.matchID==id, "meanSpeedCourtA13"] = (subg
              .where(subg.server.isin([1,3])).where(subg.court == "A").speed.mean())
    df.loc[df.matchID==id, "meanSpeedCourtD13"] = (subg
              .where(subg.server.isin([1,3])).where(subg.court == "D").speed.mean())


Specail thanks to @Dark to point it out that I was hard coding groupby.

For loc, it can be used to select values based on information from 2 axes: rows and columns. By convention on the documentation, the sequence to put information is rows first and columns second. For example, in df.loc[df.matchID==id, "meanSpeedCourtD13"], df.matchID==id is about selecting rows that have matchID being id and that "meanSpeedCourtD13" specifies a column we want to look into. 

Side notes about calculating mean:


for each group subg
where(subg.server.isin([1,3])) then filter out server not in [1 ,3].
where(subg.court == "A") further to do filtering on court.
finally call mean to compute mean from speed.




As an alternative, you can use np.where to assign values to each matchID in [1, 2]. This works only for binary matchID. It is roughly the same speed with the groupby method above tested on my computer. To save space, we only demonstrate with "meanSpeedCourtA13" column.

# First we calculate the means
# Calculate mean for Group with mathcID being 1
meanSpeedCourtA13_ID1 = (df[df.matchID==1].
                 where(df.server.isin([1,3])).where(df.court == "A").speed.mean())    
# Calculate mean for Group with matchID being 2
meanSpeedCourtA13_ID2 = (df[df.matchID==2].
                 where(df.server.isin([1,3])).where(df.court == "A").speed.mean())
# Use np.where to allocate values to each matchID in [1, 2]
df["meanSpeedCourtA13"] = np.where(df.matchID == 1,
                                   meanSpeedCourtA13_ID1, meanSpeedCourtA13_ID2)


For np.where(condition, x, y), it will return x if condition is met, y otherwise. See np.where for documentation.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它3个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复