Filtering pandas dataframe with multiple Boolean columns

前端 未结 5 2009
闹比i
闹比i 2020-12-05 07:26

I am trying to filter a df using several Boolean variables that are a part of the df, but have been unable to do so.

Sample data:

A | B | C | D
John         


        
相关标签:
5条回答
  • 2020-12-05 08:16
    In [82]: d
    Out[82]:
                 A   B      C      D
    0     John Doe  45   True  False
    1   Jane Smith  32  False  False
    2  Alan Holmes  55  False   True
    3   Eric Lamar  29   True   True
    

    Solution 1:

    In [83]: d.loc[d.C | d.D]
    Out[83]:
                 A   B      C      D
    0     John Doe  45   True  False
    2  Alan Holmes  55  False   True
    3   Eric Lamar  29   True   True
    

    Solution 2:

    In [94]: d[d[['C','D']].any(1)]
    Out[94]:
                 A   B      C      D
    0     John Doe  45   True  False
    2  Alan Holmes  55  False   True
    3   Eric Lamar  29   True   True
    

    Solution 3:

    In [95]: d.query("C or D")
    Out[95]:
                 A   B      C      D
    0     John Doe  45   True  False
    2  Alan Holmes  55  False   True
    3   Eric Lamar  29   True   True
    

    PS If you change your solution to:

    df[(df['C']==True) | (df['D']==True)]
    

    it'll work too

    Pandas docs - boolean indexing


    why we should NOT use "PEP complaint" df["col_name"] is True instead of df["col_name"] == True?

    In [11]: df = pd.DataFrame({"col":[True, True, True]})
    
    In [12]: df
    Out[12]:
        col
    0  True
    1  True
    2  True
    
    In [13]: df["col"] is True
    Out[13]: False               # <----- oops, that's not exactly what we wanted
    
    0 讨论(0)
  • 2020-12-05 08:19

    Or

    d[d.eval('C or D')]
    
    Out[1065]:
                 A   B      C      D
    0     John Doe  45   True  False
    2  Alan Holmes  55  False   True
    3   Eric Lamar  29   True   True
    
    0 讨论(0)
  • 2020-12-05 08:21

    Hooray! More options!

    np.where

    df[np.where(df.C | df.D, True, False)]
    
                 A   B      C      D
    0     John Doe  45   True  False
    2  Alan Holmes  55  False   True
    3   Eric Lamar  29   True   True  
    

    pd.Series.where on df.index

    df.loc[df.index.where(df.C | df.D).dropna()]
    
                   A   B      C      D
    0.0     John Doe  45   True  False
    2.0  Alan Holmes  55  False   True
    3.0   Eric Lamar  29   True   True
    

    df.select_dtypes

    df[df.select_dtypes([bool]).any(1)]   
    
                 A   B      C      D
    0     John Doe  45   True  False
    2  Alan Holmes  55  False   True
    3   Eric Lamar  29   True   True
    

    Abusing np.select

    df.iloc[np.select([df.C | df.D], [df.index])].drop_duplicates()
    
                 A   B      C      D
    0     John Doe  45   True  False
    2  Alan Holmes  55  False   True
    3   Eric Lamar  29   True   True
    
    0 讨论(0)
  • 2020-12-05 08:24

    you could try this easily:

    df1 = df[(df['C']=='True') | (df['D']=='True')]
    

    Note:

    1. The or logical operator needs to be replaced by the bitwise | operator.
    2. Ensure that () are used to enclose each of the operands.
    0 讨论(0)
  • 2020-12-05 08:25

    So, the easiest way to do this:

    students = [ ('jack1', 'Apples1' , 341) ,
                 ('Riti1', 'Mangos1'  , 311) ,
                 ('Aadi1', 'Grapes1' , 301) ,
                 ('Sonia1', 'Apples1', 321) ,
                 ('Lucy1', 'Mangos1'  , 331) ,
                 ('Mike1', 'Apples1' , 351),
                  ('Mik', 'Apples1' , np.nan)
                  ]
    #Create a DataFrame object
    df = pd.DataFrame(students, columns = ['Name1' , 'Product1', 'Sale1']) 
    print(df)
    
    
        Name1 Product1  Sale1
    0   jack1  Apples1    341
    1   Riti1  Mangos1    311
    2   Aadi1  Grapes1    301
    3  Sonia1  Apples1    321
    4   Lucy1  Mangos1    331
    5   Mike1  Apples1    351
    6     Mik  Apples1    NaN
    
    # Select rows in above DataFrame for which ‘Product’ column contains the value ‘Apples’,
    subset = df[df['Product1'] == 'Apples1']
    print(subset)
    
     Name1 Product1  Sale1
    0   jack1  Apples1    341
    3  Sonia1  Apples1    321
    5   Mike1  Apples1    351
    6     Mik  Apples1    NA
    
    # Select rows in above DataFrame for which ‘Product’ column contains the value ‘Apples’, AND notnull value in Sale
    
    subsetx= df[(df['Product1'] == "Apples1")  & (df['Sale1'].notnull())]
    print(subsetx)
        Name1   Product1    Sale1
    0   jack1   Apples1      341
    3   Sonia1  Apples1      321
    5   Mike1   Apples1      351
    
    # Select rows in above DataFrame for which ‘Product’ column contains the value ‘Apples’, AND Sale = 351
    
    subsetx= df[(df['Product1'] == "Apples1")  & (df['Sale1'] == 351)]
    print(subsetx)
    
       Name1 Product1  Sale1
    5  Mike1  Apples1    351
    
    # Another example
    subsetData = df[df['Product1'].isin(['Mangos1', 'Grapes1']) ]
    print(subsetData)
    
    Name1 Product1  Sale1
    1  Riti1  Mangos1    311
    2  Aadi1  Grapes1    301
    4  Lucy1  Mangos1    331
    
    

    Here is the source of this code: https://thispointer.com/python-pandas-select-rows-in-dataframe-by-conditions-on-multiple-columns/
    I added minor changes to it.

    0 讨论(0)
提交回复
热议问题