Fastest way to calculate in Pandas?

后端未结

关注

 3  1091

Given these two dataframes:

df1 =
     Name  Start  End
  0  A     10     20
  1  B     20     30
  2  C     30     40

df2 =
     0   1
  0  5   10
  1  15  20


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  孤独总比滥情好        
                
              
                            
                2021-01-27 04:17
              
            
            
                                                                       
This is one way to go about it: 

 #create numpy arrays of df1 and 2

df1_start = df1.loc[:,'Start'].to_numpy()
df1_end = df1.loc[:,'End'].to_numpy()

df2_start = df2[0].to_numpy()
df2_end = df2[1].to_numpy()

#use np tile to create shapes
#that allow element wise subtraction
tiled_start = np.tile(df1_start,(len(df2),1)).T
tiled_end = np.tile(df1_end,(len(df2),1)).T

#subtract df2 from df1
start = np.subtract(tiled_start,df2_start)
end = np.subtract(tiled_end, df2_end)

#create columns for start and end
start_columns = [f'Start_Diff_{num}' for num in range(len(df2))]
end_columns = [f'End_Diff_{num}' for num in range(len(df2))]

#create dataframes of start and end
start_df = pd.DataFrame(start,columns=start_columns)
end_df = pd.DataFrame(end, columns = end_columns)

#lump start and end into one dataframe
lump = pd.concat([start_df,end_df],axis=1)

#sort the columns by the digits at the end
filtered = final.columns[final.columns.str.contains('\d')]

cols = sorted(filtered, key = lambda x: x[-1])

lump = lump.reindex(cols,axis='columns')

#hook lump back to df1
final = pd.concat([df1,lump],axis=1)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  栀梦        
                
              
                            
                2021-01-27 04:35
              
            
            
                                                                       
I suggest use here numpy - convert selected columns to 2d numpy array in first step::

a = df1[['Start','End']].to_numpy()
b = df2[[0,1]].to_numpy()


Output is 3d array, convert it to 2d array:

c = (a - b[:, None]).swapaxes(0,1).reshape(a.shape[0],-1)
print (c)
[[  5  10  -5   0 -15 -10]
 [ 15  20   5  10  -5   0]
 [ 25  30  15  20   5  10]]


Last generate columns names and with DataFrame.join add to original:

cols = [item for x in range(b.shape[0]) for item in (f'Start_Diff_{x}', f'End_Diff_{x}')]
df = df1.join(pd.DataFrame(c, columns=cols, index=df1.index))
print (df)
  Name  Start  End  Start_Diff_0  End_Diff_0  Start_Diff_1  End_Diff_1  \
0    A     10   20             5          10            -5           0   
1    B     20   30            15          20             5          10   
2    C     30   40            25          30            15          20   

   Start_Diff_2  End_Diff_2  
0           -15         -10  
1            -5           0  
2             5          10  

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  野趣味        
                
              
                            
                2021-01-27 04:38
              
            
            
                                                                       
Don't use iterrows(). If you're simply subtracting values, use vectorization with Numpy (Pandas also offers vectorization, but Numpy is faster).

For instance:

df2 = pd.DataFrame([[5, 10], [15, 20], [25, 30]], columns=None)

col_names = "Start_Diff_1 End_Diff_1".split()
df3 = pd.DataFrame(df2.to_numpy() - 10, columns=colnames)


Here df3 equals:

    Start_Diff_1    End_Diff_1
0           -5              0
1           5               10
2           15              20


You can also change column names by doing:

df2.columns = "Start_Diff_0 End_Diff_0".split()


You can use f-strings to change column names in a loop, i.e., f"Start_Diff_{i}", where i is a number in a loop

You can also combine multiple dataframes with:

df = pd.concat([df1, df2],axis=1)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复