Specific pandas columns as arguments in new column of df.apply outputs

前端未结

关注

 2  426

Given a pandas DataFrame as below:

import pandas as pd
from sklearn.metrics import mean_squared_error

    df = pd.DataFrame.from_dict(  
         {\'row\': [\'a


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  一整个雨季        
                
              
                            
                2021-01-24 19:00
              
            
            
                                                                       
The df.apply approach:

df['rmse'] = df.apply(lambda x: mean_squared_error(x[['a','b','c']], x[['d','e','y']])**0.5, axis=1)

col     a     b     c     d     e     y      rmse
row                                              
a    0.00 -0.80 -0.60 -0.30  0.80  0.01  1.003677
b   -0.80  0.00  0.50  0.70 -0.90  0.01  1.048825
c   -0.60  0.50  0.00  0.30  0.10  0.01  0.568653
d   -0.30  0.70  0.30  0.00  0.20  0.01  0.375988
e    0.80 -0.90  0.10  0.20  0.00  0.01  0.626658
y    0.01  0.01  0.01  0.01  0.01  0.00  0.005774

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  鱼传尺愫        
                
              
                            
                2021-01-24 19:04
              
            
            
                                                                       
Approach #1

One approach for performance would be to use the underlying array data alongwith NumPy ufuncs, alongwith slicing those two blocks of columns to use those ufuncs in a vectorized manner, like so -

a = df.values
rmse_out = np.sqrt(((a[:,0:3] - a[:,3:6])**2).mean(1))
df['rmse_out'] = rmse_out


Approach #2

Alternative faster way to compute the RMSE values with np.einsum to replace the squared-summation -

diffs = a[:,0:3] - a[:,3:6]
rmse_out = np.sqrt(np.einsum('ij,ij->i',diffs,diffs)/3.0)


Approach #3

Another way to compute rmse_out using the formula : 


  (a - b)^2 = a^2 + b^2 - 2ab


would be to extract the slices :

s0 = a[:,0:3]
s1 = a[:,3:6]


Then, rmse_out would be -

np.sqrt(((s0**2).sum(1) + (s1**2).sum(1) - (2*s0*s1).sum(1))/3.0)


which with einsum becomes -

np.sqrt((np.einsum('ij,ij->i',s0,s0) + \
         np.einsum('ij,ij->i',s1,s1) - \
       2*np.einsum('ij,ij->i',s0,s1))/3.0)




Getting respective column indices 

If you are not sure whether the columns a,b,.. would be in that order or not, we could find those indices with column_index. 

Thus a[:,0:3] would be replaced by a[:,column_index(df, ['a','b','c'])] and a[:,3:6] by a[:,column_index(df, ['d','e','y'])].
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复