pandas return columns in dataframe that are not in other dataframe

后端 未结 4 1633
暖寄归人
暖寄归人 2020-12-20 03:22

I have two dataframes that look like this:

df_1 = pd.DataFrame({
\'A\' : [1.0, 2.0, 3.0, 4.0],
\'B\' : [100, 200, 300, 400],
\'C\' : [2, 3, 4, 5] 
                   


        
相关标签:
4条回答
  • 2020-12-20 03:41

    Pandas index object have set-like properties, so you can directly do:

    df_2.columns.difference(df_1.columns)
    Index([u'D'], dtype='object')
    

    You can also use operators like &|^ to compute intersection, union and symmetric difference:

    df_1.columns & df_2.columns
    Index([u'B', u'C'], dtype='object')
    
    df_1.columns | df_2.columns
    Index([u'A', u'B', u'C', u'D'], dtype='object')
    
    df_1.columns ^ df_2.columns
    Index([u'A', u'D'], dtype='object')
    

    There use to be the -operator for difference, now deprecated:

    df_2.columns - df_1.columns
    FutureWarning: using '-' to provide set differences with Indexes is deprecated, use .difference()
    Index([u'D'], dtype='object')
    
    0 讨论(0)
  • 2020-12-20 03:44

    Numpy solution with numpy.setdiff1d:

    a = np.setdiff1d(df_2.columns, df_1.columns)
    print (a)
    ['D']
    

    Pandas solution with Index.difference:

    a = df_2.columns.difference(df_1.columns)
    print (a)
    Index(['D'], dtype='object')
    

    Another pandas methods are intersection, union and symmetric_difference :

    print (df_2.columns.intersection(df_1.columns))
    Index(['B', 'C'], dtype='object')
    
    print (df_2.columns.union(df_1.columns))
    Index(['A', 'B', 'C', 'D'], dtype='object')
    
    print (df_2.columns.symmetric_difference(df_1.columns))
    Index(['A', 'D'], dtype='object')
    

    And numpy functions are intersect1d, union1d and setxor1d:

    print (np.intersect1d(df_2.columns, df_1.columns))
    ['B' 'C']
    
    print (np.union1d(df_2.columns, df_1.columns))
    ['A' 'B' 'C' 'D']
    
    print (np.setxor1d(df_2.columns, df_1.columns))
    ['A' 'D']
    
    0 讨论(0)
  • 2020-12-20 03:44

    You can use:

    set(df_2.columns.values) - set(df_1.columns.values)
    

    which returns a set containing column labels of columns in df_2 but not in df_1.

    0 讨论(0)
  • 2020-12-20 04:02

    here it is buddy

    set(df_2.columns).difference(df_1.columns)
    Out[76]: {'D'}
    
    0 讨论(0)
提交回复
热议问题