Find equal columns between two dataframes

后端 未结 4 1091
迷失自我
迷失自我 2020-12-29 06:49

I have two pandas data frames, a and b:

a1   a2   a3   a4   a5   a6   a7
1    3    4    5    3    4    5
0    2    0           


        
相关标签:
4条回答
  • 2020-12-29 07:03

    One way of merge

    s=df1.T.reset_index().merge(df2.T.assign(match=lambda x : x.index))
    dict(zip(s['index'],s['match']))
    {'a1': 'b5', 'a2': 'b7', 'a3': 'b6', 'a4': 'b4', 'a5': 'b1', 'a6': 'b3', 'a7': 'b2'}
    
    0 讨论(0)
  • 2020-12-29 07:04

    Here's one way leveraging broadcasting to check for equality between both dataframes and taking all on the result to check where all rows match. Then we can obtain indexing arrays for both dataframe's column names from the result of np.where (with @piR's contribution):

    i, j = np.where((a.values[:,None] == b.values[:,:,None]).all(axis=0))
    dict(zip(a.columns[j], b.columns[i]))
    # {'a7': 'b2', 'a6': 'b3', 'a4': 'b4', 'a2': 'b7'}
    
    0 讨论(0)
  • Here is a way using sort_values:

    m=df1.T.sort_values(by=[*df1.index]).index
    n=df2.T.sort_values(by=[*df2.index]).index
    d=dict(zip(m,n))
    print(d)
    

    {'a1': 'b5', 'a5': 'b1', 'a2': 'b7', 'a3': 'b6', 'a6': 'b3', 'a7': 'b2', 'a4': 'b4'}
    
    0 讨论(0)
  • 2020-12-29 07:22

    dictionary comprehensions

    Use a tuple of the column values as the hashable key in a dictionary

    d = {(*t,): c for c, t in df2.items()}
    {c: d[(*t,)] for c, t in df1.items()}
    
    {'a1': 'b5',
     'a2': 'b7',
     'a3': 'b6',
     'a4': 'b4',
     'a5': 'b1',
     'a6': 'b3',
     'a7': 'b2'}
    

    Just in case we don't have perfect representation, I've only produced the dictionary for columns where there is a match.

    d2 = {(*t,): c for c, t in df2.items()}
    d1 = {(*t,): c for c, t in df1.items()}
    
    {d1[c]: d2[c] for c in {*d1} & {*d2}}
    
    {'a5': 'b1',
     'a2': 'b7',
     'a7': 'b2',
     'a6': 'b3',
     'a3': 'b6',
     'a1': 'b5',
     'a4': 'b4'}
    

    idxmax

    This borders on the absurd... Don't actually do this.

    {c: df2.T.eq(df1[c]).sum(1).idxmax() for c in df1}
    
    {'a1': 'b5',
     'a2': 'b7',
     'a3': 'b6',
     'a4': 'b4',
     'a5': 'b1',
     'a6': 'b3',
     'a7': 'b2'}
    
    0 讨论(0)
提交回复
热议问题