Fastest way to calculate in Pandas?

后端 未结 3 1090
耶瑟儿~
耶瑟儿~ 2021-01-27 03:44

Given these two dataframes:

df1 =
     Name  Start  End
  0  A     10     20
  1  B     20     30
  2  C     30     40

df2 =
     0   1
  0  5   10
  1  15  20
         


        
3条回答
  •  栀梦
    栀梦 (楼主)
    2021-01-27 04:35

    I suggest use here numpy - convert selected columns to 2d numpy array in first step::

    a = df1[['Start','End']].to_numpy()
    b = df2[[0,1]].to_numpy()
    

    Output is 3d array, convert it to 2d array:

    c = (a - b[:, None]).swapaxes(0,1).reshape(a.shape[0],-1)
    print (c)
    [[  5  10  -5   0 -15 -10]
     [ 15  20   5  10  -5   0]
     [ 25  30  15  20   5  10]]
    

    Last generate columns names and with DataFrame.join add to original:

    cols = [item for x in range(b.shape[0]) for item in (f'Start_Diff_{x}', f'End_Diff_{x}')]
    df = df1.join(pd.DataFrame(c, columns=cols, index=df1.index))
    print (df)
      Name  Start  End  Start_Diff_0  End_Diff_0  Start_Diff_1  End_Diff_1  \
    0    A     10   20             5          10            -5           0   
    1    B     20   30            15          20             5          10   
    2    C     30   40            25          30            15          20   
    
       Start_Diff_2  End_Diff_2  
    0           -15         -10  
    1            -5           0  
    2             5          10  
    

提交回复
热议问题