Mapping methods across multiple columns in a Pandas DataFrame

前端 未结 3 1848
挽巷
挽巷 2021-01-23 21:31

I have a Pandas dataframe where the values are lists:

import pandas as pd

DF = pd.DataFrame({\'X\':[[1, 5], [1, 2]], \'Y\':[[1, 2, 5], [1, 3, 5]]})
DF
                  


        
相关标签:
3条回答
  • 2021-01-23 21:38

    Option 1
    set conversion and difference using np.where

    df_temp = DF.applymap(set)
    DF['x_sub_y'] = np.where(df_temp.X - df_temp.Y, False, True)
    DF
            X          Y  x_sub_y
    0  [1, 5]  [1, 2, 5]     True
    1  [1, 2]  [1, 3, 5]    False
    

    Option 2
    Faster, astype conversion

    DF['x_sub_y'] = ~(DF.X.apply(set) - DF.Y.apply(set)).astype(bool)
    DF 
            X          Y  x_sub_y
    0  [1, 5]  [1, 2, 5]     True
    1  [1, 2]  [1, 3, 5]    False
    

    Option 3
    Fun with np.vectorize

    def foo(x):
         return not x
    
    v = np.vectorize(foo)    
    DF['x_sub_y'] = v(DF.X.apply(set) - DF.Y.apply(set)) 
    DF
            X          Y  x_sub_y
    0  [1, 5]  [1, 2, 5]     True
    1  [1, 2]  [1, 3, 5]    False
    

    Extending Scott Boston's answer for speed using the same approach:

    def foo(x, y):
        return set(x).issubset(y)
    
    v = np.vectorize(foo)
    
    DF['x_sub_y'] = v(DF.X, DF.Y)
    DF
            X          Y  x_sub_y
    0  [1, 5]  [1, 2, 5]     True
    1  [1, 2]  [1, 3, 5]    False
    

    Small

    1000 loops, best of 3: 460 µs per loop           # Before       
    10000 loops, best of 3: 103 µs per loop          # After
    

    Large (df * 10000)

    1 loop, best of 3: 1.26 s per loop               # Before   
    100 loops, best of 3: 13.3 ms per loop           # After
    
    0 讨论(0)
  • 2021-01-23 21:39

    Or you can try set

    DF['x_sub_y']=DF.X+DF.Y
    DF['x_sub_y']=DF['x_sub_y'].apply(lambda x : list(set(x)))==DF.Y
    DF
    Out[691]: 
            X          Y  x_sub_y
    0  [1, 5]  [1, 2, 5]     True
    1  [1, 2]  [1, 3, 5]    False
    
    0 讨论(0)
  • 2021-01-23 21:54

    Use set and issubset:

    DF.assign(x_sub_y = DF.apply(lambda x: set(x.X).issubset(set(x.Y)), axis=1))
    

    Output:

            X          Y  x_sub_y
    0  [1, 5]  [1, 2, 5]     True
    1  [1, 2]  [1, 3, 5]    False
    
    0 讨论(0)
提交回复
热议问题