pandas get rows which are NOT in other dataframe

后端 未结 13 885
春和景丽
春和景丽 2020-11-22 02:17

I\'ve two pandas data frames which have some rows in common.

Suppose dataframe2 is a subset of dataframe1.

How can I get the rows of dataframe1 which

13条回答
  •  青春惊慌失措
    2020-11-22 02:56

    As already hinted at, isin requires columns and indices to be the same for a match. If match should only be on row contents, one way to get the mask for filtering the rows present is to convert the rows to a (Multi)Index:

    In [77]: df1 = pandas.DataFrame(data = {'col1' : [1, 2, 3, 4, 5, 3], 'col2' : [10, 11, 12, 13, 14, 10]})
    In [78]: df2 = pandas.DataFrame(data = {'col1' : [1, 3, 4], 'col2' : [10, 12, 13]})
    In [79]: df1.loc[~df1.set_index(list(df1.columns)).index.isin(df2.set_index(list(df2.columns)).index)]
    Out[79]:
       col1  col2
    1     2    11
    4     5    14
    5     3    10
    

    If index should be taken into account, set_index has keyword argument append to append columns to existing index. If columns do not line up, list(df.columns) can be replaced with column specifications to align the data.

    pandas.MultiIndex.from_tuples(df.to_records(index = False).tolist())
    

    could alternatively be used to create the indices, though I doubt this is more efficient.

提交回复
热议问题