Say I have two dataframes
df1
df2
that I can join on df1_keys
and df2_keys
.
I would like to do:
Although these aren't supported directly, they can be achieved by tweaking with the indexes before attempting the join...
You can do set minus using the -
operator:
In [11]: ind = pd.Index([1, 2, 3])
In [12]: ind2 = pd.Index([3, 4, 5])
In [13]: ind - ind2
Out[13]: Int64Index([1, 2], dtype='int64')
and set union with the |
and intersection with &
:
In [14]: ind | ind2
Out[14]: Int64Index([1, 2, 3, 4, 5], dtype='int64')
In [15]: ind & ind2
Out[15]: Int64Index([3], dtype='int64')
So if you have some DataFrames with these indexes, you can reindex before you join:
In [21]: df = pd.DataFrame(np.random.randn(3), ind, ['a']) # ind = df.index
In [22]: df2 = pd.DataFrame(np.random.randn(3), ind2, ['b']) # ind2 = df2.index
In [23]: df.reindex(ind & ind2)
Out[23]:
a
3 1.368518
So now you can build up whatever join you want:
In [24]: df.reindex(ind & ind2).join(df2.reindex(ind & ind2)) # equivalent to inner
Out[24]:
a b
3 1.368518 -1.335534
In [25]: df.reindex(ind - ind2).join(df2.reindex(ind - ind2)) # join on A set minus B
Out[25]:
a b
1 1.193652 NaN
2 0.064467 NaN