I have two pandas
data frames, a
and b
:
a1 a2 a3 a4 a5 a6 a7
1 3 4 5 3 4 5
0 2 0
One way of merge
s=df1.T.reset_index().merge(df2.T.assign(match=lambda x : x.index))
dict(zip(s['index'],s['match']))
{'a1': 'b5', 'a2': 'b7', 'a3': 'b6', 'a4': 'b4', 'a5': 'b1', 'a6': 'b3', 'a7': 'b2'}
Here's one way leveraging broadcasting to check for equality between both dataframes and taking all on the result to check where all rows match. Then we can obtain indexing arrays for both dataframe's column names from the result of np.where (with @piR's contribution):
i, j = np.where((a.values[:,None] == b.values[:,:,None]).all(axis=0))
dict(zip(a.columns[j], b.columns[i]))
# {'a7': 'b2', 'a6': 'b3', 'a4': 'b4', 'a2': 'b7'}
Here is a way using sort_values
:
m=df1.T.sort_values(by=[*df1.index]).index
n=df2.T.sort_values(by=[*df2.index]).index
d=dict(zip(m,n))
print(d)
{'a1': 'b5', 'a5': 'b1', 'a2': 'b7', 'a3': 'b6', 'a6': 'b3', 'a7': 'b2', 'a4': 'b4'}
Use a tuple
of the column values as the hashable key in a dictionary
d = {(*t,): c for c, t in df2.items()}
{c: d[(*t,)] for c, t in df1.items()}
{'a1': 'b5',
'a2': 'b7',
'a3': 'b6',
'a4': 'b4',
'a5': 'b1',
'a6': 'b3',
'a7': 'b2'}
Just in case we don't have perfect representation, I've only produced the dictionary for columns where there is a match.
d2 = {(*t,): c for c, t in df2.items()}
d1 = {(*t,): c for c, t in df1.items()}
{d1[c]: d2[c] for c in {*d1} & {*d2}}
{'a5': 'b1',
'a2': 'b7',
'a7': 'b2',
'a6': 'b3',
'a3': 'b6',
'a1': 'b5',
'a4': 'b4'}
idxmax
This borders on the absurd... Don't actually do this.
{c: df2.T.eq(df1[c]).sum(1).idxmax() for c in df1}
{'a1': 'b5',
'a2': 'b7',
'a3': 'b6',
'a4': 'b4',
'a5': 'b1',
'a6': 'b3',
'a7': 'b2'}