I have two dataframes:
df1=
A B C
0 A0 B0 C0
1 A1 B1 C1
2 A2 B2 C2
df2=
A B C
0 A2 B2 C10
1 A1 B3 C11
2 A9 B4
In [63]:
df1['A'].isin(df2['A']) & df1['B'].isin(df2['B'])
Out[63]:
0 False
1 False
2 True
you can use the left merge to obtain values that exist in both frames +
values that exist in the first data frame only
In [10]:
left = pd.merge(df1 , df2 , on = ['A' , 'B'] ,how = 'left')
left
Out[10]:
A B C_x C_y
0 A0 B0 C0 NaN
1 A1 B1 C1 NaN
2 A2 B2 C2 C10
then of course values that exist only in the first frame will have NAN
values in columns of the other data frame , then you can filter by this NAN
values by doing the following
In [16]:
left.loc[pd.isnull(left['C_y']) , 'A':'C_x']
Out[16]:
A B C_x
0 A0 B0 C0
1 A1 B1 C1
In [17]:
if you want to get whether the values in A
exists in B
you can do the following
In [20]:
pd.notnull(left['C_y'])
Out[20]:
0 False
1 False
2 True
Name: C_y, dtype: bool
Ideally, one would like to be able to just use ~df1[COLS].isin(df2[COLS]) as a mask, but this requires index labels to match (https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.isin.html)
Here is a succinct form that uses .isin but converts the second DataFrame to a dict so that index labels don't need to match:
COLS = ['A', 'B'] # or whichever columns to use for comparison
df1[~df1[COLS].isin(df2[COLS].to_dict(
orient='list')).all(axis=1)]
~df1['A'].isin(df2['A'])
Should get you the series you want
df1[ ~df1['A'].isin(df2['A'])]
The dataframe:
A B C
0 A0 B0 C0
If your version is 0.17.0
then you can use pd.merge and pass the cols of interest, how='left' and set indicator=True
to whether the values are only present in left or both. You can then test whether the appended _merge
col is equal to 'both':
In [102]:
pd.merge(df1, df2, on='A',how='left', indicator=True)['_merge'] == 'both'
Out[102]:
0 False
1 True
2 True
Name: _merge, dtype: bool
In [103]:
pd.merge(df1, df2, on=['A', 'B'],how='left', indicator=True)['_merge'] == 'both'
Out[103]:
0 False
1 False
2 True
Name: _merge, dtype: bool
output from the merge:
In [104]:
pd.merge(df1, df2, on='A',how='left', indicator=True)
Out[104]:
A B_x C_x B_y C_y _merge
0 A0 B0 C0 NaN NaN left_only
1 A1 B1 C1 B3 C11 both
2 A2 B2 C2 B2 C10 both
In [105]:
pd.merge(df1, df2, on=['A', 'B'],how='left', indicator=True)
Out[105]:
A B C_x C_y _merge
0 A0 B0 C0 NaN left_only
1 A1 B1 C1 NaN left_only
2 A2 B2 C2 C10 both