Given these two dataframes:
df1 =
Name Start End
0 A 10 20
1 B 20 30
2 C 30 40
df2 =
0 1
0 5 10
1 15 20
This is one way to go about it:
#create numpy arrays of df1 and 2
df1_start = df1.loc[:,'Start'].to_numpy()
df1_end = df1.loc[:,'End'].to_numpy()
df2_start = df2[0].to_numpy()
df2_end = df2[1].to_numpy()
#use np tile to create shapes
#that allow element wise subtraction
tiled_start = np.tile(df1_start,(len(df2),1)).T
tiled_end = np.tile(df1_end,(len(df2),1)).T
#subtract df2 from df1
start = np.subtract(tiled_start,df2_start)
end = np.subtract(tiled_end, df2_end)
#create columns for start and end
start_columns = [f'Start_Diff_{num}' for num in range(len(df2))]
end_columns = [f'End_Diff_{num}' for num in range(len(df2))]
#create dataframes of start and end
start_df = pd.DataFrame(start,columns=start_columns)
end_df = pd.DataFrame(end, columns = end_columns)
#lump start and end into one dataframe
lump = pd.concat([start_df,end_df],axis=1)
#sort the columns by the digits at the end
filtered = final.columns[final.columns.str.contains('\d')]
cols = sorted(filtered, key = lambda x: x[-1])
lump = lump.reindex(cols,axis='columns')
#hook lump back to df1
final = pd.concat([df1,lump],axis=1)
I suggest use here numpy - convert selected columns to 2d numpy
array in first step::
a = df1[['Start','End']].to_numpy()
b = df2[[0,1]].to_numpy()
Output is 3d array, convert it to 2d array
:
c = (a - b[:, None]).swapaxes(0,1).reshape(a.shape[0],-1)
print (c)
[[ 5 10 -5 0 -15 -10]
[ 15 20 5 10 -5 0]
[ 25 30 15 20 5 10]]
Last generate columns names and with DataFrame.join add to original:
cols = [item for x in range(b.shape[0]) for item in (f'Start_Diff_{x}', f'End_Diff_{x}')]
df = df1.join(pd.DataFrame(c, columns=cols, index=df1.index))
print (df)
Name Start End Start_Diff_0 End_Diff_0 Start_Diff_1 End_Diff_1 \
0 A 10 20 5 10 -5 0
1 B 20 30 15 20 5 10
2 C 30 40 25 30 15 20
Start_Diff_2 End_Diff_2
0 -15 -10
1 -5 0
2 5 10
Don't use iterrows()
. If you're simply subtracting values, use vectorization with Numpy (Pandas also offers vectorization, but Numpy is faster).
For instance:
df2 = pd.DataFrame([[5, 10], [15, 20], [25, 30]], columns=None)
col_names = "Start_Diff_1 End_Diff_1".split()
df3 = pd.DataFrame(df2.to_numpy() - 10, columns=colnames)
Here df3
equals:
Start_Diff_1 End_Diff_1
0 -5 0
1 5 10
2 15 20
You can also change column names by doing:
df2.columns = "Start_Diff_0 End_Diff_0".split()
You can use f-strings to change column names in a loop, i.e., f"Start_Diff_{i}"
, where i is a number in a loop
You can also combine multiple dataframes with:
df = pd.concat([df1, df2],axis=1)