Efficiently processing DataFrame rows with a Python function?

前端未结
关注
 1  1492
傲寒 2021-02-13 19:08
In many places in our Pandas-using code, we have some Python function process(row). That function is used over DataFrame.iterrows(), taking each

      
      
        
          1条回答        

        
                    
            
            
                         
                
              
              
                
                   野性不改
                                             
                
                
                (楼主)
            
              
              
                2021-02-13 20:01
              

            
            
                        
You should apply your function along the axis=1. Function will receive a row as an argument, and anything it returns will be collected into a new series object

df.apply(you_function, axis=1)


Example:

>>> df = pd.DataFrame({'a': np.arange(3),
                       'b': np.random.rand(3)})
>>> df
   a         b
0  0  0.880075
1  1  0.143038
2  2  0.795188
>>> def func(row):
        return row['a'] + row['b']
>>> df.apply(func, axis=1)
0    0.880075
1    1.143038
2    2.795188
dtype: float64


As for the second part of the question: row wise operations, even optimised ones, using pandas apply, are not the fastest solution there is. They are certainly a lot faster than a python for loop, but not the fastest. You can test that by timing operations and you'll see the difference.

Some operation could be converted to column oriented ones (one in my example could be easily converted to just df['a'] + df['b']), but others cannot. Especially if you have a lot of branching, special cases or other logic that should be perform on your row. In that case, if the apply is too slow for you, I would suggest "Cython-izing" your code. Cython plays really nicely with the NumPy C api and will give you the maximal speed you can achieve.

Or you can try numba. :)
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                    
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复