How to count nan values in a pandas DataFrame?

前端 未结 7 1762
天涯浪人
天涯浪人 2020-12-18 18:40

What is the best way to account for (not a number) nan values in a pandas DataFrame?

The following code:

import numpy as np
import pandas as pd
dfd =         


        
相关标签:
7条回答
  • 2020-12-18 19:09

    A good clean way to count all NaN's in all columns of your dataframe would be ...

    import pandas as pd 
    import numpy as np
    
    
    df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan]})
    print(df.isna().sum().sum())
    

    Using a single sum, you get the count of NaN's for each column. The second sum, sums those column sums.

    0 讨论(0)
  • 2020-12-18 19:16

    If you want to count only NaN values in column 'a' of a DataFrame df, use:

    len(df) - df['a'].count()
    

    Here count() tells us the number of non-NaN values, and this is subtracted from the total number of values (given by len(df)).

    To count NaN values in every column of df, use:

    len(df) - df.count()
    

    If you want to use value_counts, tell it not to drop NaN values by setting dropna=False (added in 0.14.1):

    dfv = dfd['a'].value_counts(dropna=False)
    

    This allows the missing values in the column to be counted too:

     3     3
    NaN    2
     1     1
    Name: a, dtype: int64
    

    The rest of your code should then work as you expect (note that it's not necessary to call sum; just print("nan: %d" % dfv[np.nan]) suffices).

    0 讨论(0)
  • 2020-12-18 19:20
    dfd['a'].isnull().value_counts()
    

    return :

    • (True 695
    • False 60,
    • Name: a, dtype: int64)
    • True : represents the null values count
    • False : represent the non-null values count
    0 讨论(0)
  • 2020-12-18 19:22

    if you only want the summary of null value for each column, using the following code df.isnull().sum() if you want to know how many null values in the data frame using following code df.isnull().sum().sum() # calculate total

    0 讨论(0)
  • 2020-12-18 19:23

    To count just null values, you can use isnull():

    In [11]:
    dfd.isnull().sum()
    
    Out[11]:
    a    2
    dtype: int64
    

    Here a is the column name, and there are 2 occurrences of the null value in the column.

    0 讨论(0)
  • 2020-12-18 19:25

    Yet another way to count all the nans in a df:

    num_nans = df.size - df.count().sum()

    Timings:

    import timeit
    
    import numpy as np
    import pandas as pd
    
    df_scale = 100000
    df = pd.DataFrame(
        [[1, np.nan, 100, 63], [2, np.nan, 101, 63], [2, 12, 102, 63],
         [2, 14, 102, 63], [2, 14, 102, 64], [1, np.nan, 200, 63]] * df_scale,
        columns=['group', 'value', 'value2', 'dummy'])
    
    repeat = 3
    numbers = 100
    
    setup = """import pandas as pd
    from __main__ import df
    """
    
    def timer(statement, _setup=None):
        print (min(
            timeit.Timer(statement, setup=_setup or setup).repeat(
                repeat, numbers)))
    
    timer('df.size - df.count().sum()')
    timer('df.isna().sum().sum()')
    timer('df.isnull().sum().sum()')
    

    prints:

    3.998805362999999
    3.7503365439999996
    3.689461442999999
    

    so pretty much equivalent

    0 讨论(0)
提交回复
热议问题