Remove one dataframe from another with Pandas

前端 未结 5 906
挽巷
挽巷 2021-01-19 23:55

I have two dataframes of different size (df1 nad df2). I would like to remove from df1 all the rows which are stored within df2<

相关标签:
5条回答
  • 2021-01-20 00:33

    I think the cleanest way can be:

    We have base dataframe D and want to remove a subset D1. Let the output be D2

    D2 = pd.DataFrame(D, index = set(D.index).difference(set(D1.index))).reset_index()
    
    0 讨论(0)
  • 2021-01-20 00:43

    The cleanest way I found was to use drop from pandas using the index of the dataframe you want to drop:

    df1.drop(df2.index, axis=0,inplace=True)
    
    0 讨论(0)
  • 2021-01-20 00:50

    Use merge with outer join with filter by query, last remove helper column by drop:

    df = pd.merge(df1, df2, on=['A','B'], how='outer', indicator=True)
           .query("_merge != 'both'")
           .drop('_merge', axis=1)
           .reset_index(drop=True)
    print (df)
         A  B  C
    0  qwe  5  a
    1  rty  9  f
    2  iop  1  k
    
    0 讨论(0)
  • 2021-01-20 00:50

    pandas has a method called isin, however this relies on unique indices. We can define a lambda function to create columns we can use in this from the existing 'A' and 'B' of df1 and df2. We then negate this (as we want the values not in df2) and reset the index:

    import pandas as pd
    
    df1 = pd.DataFrame({'A' : ['qwe', 'wer', 'wer', 'rty', 'tyu', 'tyu', 'tyu', 'iop'],
                        'B' : [    5,     6,     6,     9,     7,     7,     7,     1],
                        'C' : ['a'  ,   's',   'd',   'f',   'g',   'h',   'j',   'k']})
    
    df2 = pd.DataFrame({'A' : ['wer', 'tyu'],
                        'B' : [    6,     7]})
    
    unique_ind = lambda df: df['A'].astype(str) + '_' + df['B'].astype(str)
    print df1[~unique_ind(df1).isin(unique_ind(df2))].reset_index(drop=True)
    

    printing:

         A  B  C
    0  qwe  5  a
    1  rty  9  f
    2  iop  1  k
    
    0 讨论(0)
  • 2021-01-20 00:53

    You can use np.in1d to check if any row in df1 exists in df2. And then use it as a reversed mask to select rows from df1.

    df1[~df1[['A','B']].apply(lambda x: np.in1d(x,df2).all(),axis=1)]\
                       .reset_index(drop=True)
    Out[115]: 
         A  B  C
    0  qwe  5  a
    1  rty  9  f
    2  iop  1  k
    
    0 讨论(0)
提交回复
热议问题