How to find which columns contain any NaN value in Pandas dataframe

前端 未结 8 1637
醉酒成梦
醉酒成梦 2020-11-28 02:17

Given a pandas dataframe containing possible NaN values scattered here and there:

Question: How do I determine which columns contain NaN values? In

相关标签:
8条回答
  • 2020-11-28 02:35

    I had a problem where I had to many columns to visually inspect on the screen so a short list comp that filters and returns the offending columns is

    nan_cols = [i for i in df.columns if df[i].isnull().any()]
    

    if that's helpful to anyone

    0 讨论(0)
  • 2020-11-28 02:42

    In datasets having large number of columns its even better to see how many columns contain null values and how many don't.

    print("No. of columns containing null values")
    print(len(df.columns[df.isna().any()]))
    
    print("No. of columns not containing null values")
    print(len(df.columns[df.notna().all()]))
    
    print("Total no. of columns in the dataframe")
    print(len(df.columns))
    

    For example in my dataframe it contained 82 columns, of which 19 contained at least one null value.

    Further you can also automatically remove cols and rows depending on which has more null values
    Here is the code which does this intelligently:

    df = df.drop(df.columns[df.isna().sum()>len(df.columns)],axis = 1)
    df = df.dropna(axis = 0).reset_index(drop=True)
    

    Note: Above code removes all of your null values. If you want null values, process them before.

    0 讨论(0)
  • 2020-11-28 02:43

    UPDATE: using Pandas 0.22.0

    Newer Pandas versions have new methods 'DataFrame.isna()' and 'DataFrame.notna()'

    In [71]: df
    Out[71]:
         a    b  c
    0  NaN  7.0  0
    1  0.0  NaN  4
    2  2.0  NaN  4
    3  1.0  7.0  0
    4  1.0  3.0  9
    5  7.0  4.0  9
    6  2.0  6.0  9
    7  9.0  6.0  4
    8  3.0  0.0  9
    9  9.0  0.0  1
    
    In [72]: df.isna().any()
    Out[72]:
    a     True
    b     True
    c    False
    dtype: bool
    

    as list of columns:

    In [74]: df.columns[df.isna().any()].tolist()
    Out[74]: ['a', 'b']
    

    to select those columns (containing at least one NaN value):

    In [73]: df.loc[:, df.isna().any()]
    Out[73]:
         a    b
    0  NaN  7.0
    1  0.0  NaN
    2  2.0  NaN
    3  1.0  7.0
    4  1.0  3.0
    5  7.0  4.0
    6  2.0  6.0
    7  9.0  6.0
    8  3.0  0.0
    9  9.0  0.0
    

    OLD answer:

    Try to use isnull():

    In [97]: df
    Out[97]:
         a    b  c
    0  NaN  7.0  0
    1  0.0  NaN  4
    2  2.0  NaN  4
    3  1.0  7.0  0
    4  1.0  3.0  9
    5  7.0  4.0  9
    6  2.0  6.0  9
    7  9.0  6.0  4
    8  3.0  0.0  9
    9  9.0  0.0  1
    
    In [98]: pd.isnull(df).sum() > 0
    Out[98]:
    a     True
    b     True
    c    False
    dtype: bool
    

    or as @root proposed clearer version:

    In [5]: df.isnull().any()
    Out[5]:
    a     True
    b     True
    c    False
    dtype: bool
    
    In [7]: df.columns[df.isnull().any()].tolist()
    Out[7]: ['a', 'b']
    

    to select a subset - all columns containing at least one NaN value:

    In [31]: df.loc[:, df.isnull().any()]
    Out[31]:
         a    b
    0  NaN  7.0
    1  0.0  NaN
    2  2.0  NaN
    3  1.0  7.0
    4  1.0  3.0
    5  7.0  4.0
    6  2.0  6.0
    7  9.0  6.0
    8  3.0  0.0
    9  9.0  0.0
    
    0 讨论(0)
  • 2020-11-28 02:44

    This worked for me,

    1. For getting Columns having at least 1 null value. (column names)

    data.columns[data.isnull().any()]
    

    2. For getting Columns with count, with having at least 1 null value.

    data[data.columns[data.isnull().any()]].isnull().sum()
    

    [Optional] 3. For getting percentage of the null count.

    data[data.columns[data.isnull().any()]].isnull().sum() * 100 / data.shape[0]
    
    0 讨论(0)
  • 2020-11-28 02:45

    You can use df.isnull().sum(). It shows all columns and the total NaNs of each feature.

    0 讨论(0)
  • 2020-11-28 02:50

    i use these three lines of code to print out the column names which contain at least one null value:

    for column in dataframe:
        if dataframe[column].isnull().any():
           print('{0} has {1} null values'.format(column, dataframe[column].isnull().sum()))
    
    0 讨论(0)
提交回复
热议问题