Filter out rows with more than certain number of NaN

前端 未结 3 717
谎友^
谎友^ 2020-12-09 06:09

In a Pandas dataframe, I would like to filter out all the rows that have more than 2 NaNs.

Essentially, I have 4 columns and I would like to keep only t

3条回答
  •  醉梦人生
    2020-12-09 07:07

    The following should work

    df.dropna(thresh=2)
    

    See the online docs

    What we are doing here is dropping any NaN rows, where there are 2 or more non NaN values in a row.

    Example:

    In [25]:
    
    import pandas as pd
    
    df = pd.DataFrame({'a':[1,2,NaN,4,5], 'b':[NaN,2,NaN,4,5], 'c':[1,2,NaN,NaN,NaN], 'd':[1,2,3,NaN,5]})
    
    df
    
    Out[25]:
    
        a   b   c   d
    0   1 NaN   1   1
    1   2   2   2   2
    2 NaN NaN NaN   3
    3   4   4 NaN NaN
    4   5   5 NaN   5
    
    [5 rows x 4 columns]
    
    In [26]:
    
    df.dropna(thresh=2)
    
    Out[26]:
    
       a   b   c   d
    0  1 NaN   1   1
    1  2   2   2   2
    3  4   4 NaN NaN
    4  5   5 NaN   5
    
    [4 rows x 4 columns]
    

    EDIT

    For the above example it works but you should note that you would have to know the number of columns and set the thresh value appropriately, I thought originally it meant the number of NaN values but it actually means number of Non NaN values.

提交回复
热议问题