DataFrame algebra in Pandas

后端 未结 1 1182
暖寄归人
暖寄归人 2020-12-17 04:29

Say I have two dataframes

df1
df2

that I can join on df1_keys and df2_keys.

I would like to do:

相关标签:
1条回答
  • 2020-12-17 05:00

    Although these aren't supported directly, they can be achieved by tweaking with the indexes before attempting the join...

    You can do set minus using the - operator:

    In [11]: ind = pd.Index([1, 2, 3])
    
    In [12]: ind2 = pd.Index([3, 4, 5])
    
    In [13]: ind - ind2
    Out[13]: Int64Index([1, 2], dtype='int64')
    

    and set union with the | and intersection with &:

    In [14]: ind | ind2
    Out[14]: Int64Index([1, 2, 3, 4, 5], dtype='int64')
    
    In [15]: ind & ind2
    Out[15]: Int64Index([3], dtype='int64')
    

    So if you have some DataFrames with these indexes, you can reindex before you join:

    In [21]: df = pd.DataFrame(np.random.randn(3), ind, ['a'])  # ind = df.index
    
    In [22]: df2 = pd.DataFrame(np.random.randn(3), ind2, ['b'])  # ind2 = df2.index
    
    In [23]: df.reindex(ind & ind2)
    Out[23]:
              a
    3  1.368518
    

    So now you can build up whatever join you want:

    In [24]: df.reindex(ind & ind2).join(df2.reindex(ind & ind2))  # equivalent to inner
    Out[24]:
              a         b
    3  1.368518 -1.335534
    
    In [25]: df.reindex(ind - ind2).join(df2.reindex(ind - ind2))  # join on A set minus B
    Out[25]:
              a   b
    1  1.193652 NaN
    2  0.064467 NaN
    
    0 讨论(0)
提交回复
热议问题