Pandas: Find rows which don't exist in another DataFrame by multiple columns

后端 未结 2 509
滥情空心
滥情空心 2020-12-23 10:25

same as this python pandas: how to find rows in one dataframe but not in another? but with multiple columns

This is the setup:

import pandas as pd

d         


        
相关标签:
2条回答
  • 2020-12-23 10:47

    Since 0.17.0 there is a new indicator param you can pass to merge which will tell you whether the rows are only present in left, right or both:

    In [5]:
    merged = df.merge(other, how='left', indicator=True)
    merged
    
    Out[5]:
       col1 col2  extra_col     _merge
    0     0    a       this  left_only
    1     1    b         is       both
    2     1    c       just  left_only
    3     2    b  something  left_only
    
    In [6]:    
    merged[merged['_merge']=='left_only']
    
    Out[6]:
       col1 col2  extra_col     _merge
    0     0    a       this  left_only
    2     1    c       just  left_only
    3     2    b  something  left_only
    

    So you can now filter the merged df by selecting only 'left_only' rows

    0 讨论(0)
  • 2020-12-23 11:00

    Interesting

    cols = ['col1','col2']
    #get copies where the indeces are the columns of interest
    df2 = df.set_index(cols)
    other2 = other.set_index(cols)
    #Look for index overlap, ~
    df[~df2.index.isin(other2.index)]
    

    Returns:

        col1 col2  extra_col
    0     0    a       this
    2     1    c       just
    3     2    b  something
    

    Seems a little bit more elegant...

    0 讨论(0)
提交回复
热议问题