I\'ve two pandas data frames which have some rows in common.
Suppose dataframe2 is a subset of dataframe1.
How can I get the rows of dataframe1 which
df = df1.drop_duplicates().merge(df2.drop_duplicates(), on=df2.columns.to_list(),
how='left', indicator=True)
df.loc[df._merge=='left_only',df.columns!='_merge']
Note that drop duplicated is used to minimize the comparisons. It would work without them as well. The best way is to compare the row contents themselves and not the index or one/two columns and same code can be used for other filters like 'both' and 'right_only' as well to achieve similar results. For this syntax dataframes can have any number of columns and even different indices. Only the columns should occur in both the dataframes.
pandas.concat()
coupled with drop_duplicated()
is not ideal because it will also get rid of the rows which may be only in the dataframe you want to keep and are duplicated for valid reasons.