How do I get a summary count of missing/NaN data by column in 'pandas'?

后端 未结 5 1511
梦谈多话
梦谈多话 2020-12-23 17:06

In R I can quickly see a count of missing data using the summary command, but the equivalent pandas DataFrame method, describe

相关标签:
5条回答
  • 2020-12-23 17:44

    As a tiny addition, to get percentage missing by DataFrame column, combining @Jeff and @userS's answers above gets you:

    df.isnull().sum()/len(df)*100
    
    0 讨论(0)
  • 2020-12-23 17:45

    Following one will do the trick and will return counts of nulls for every column:

    df.isnull().sum(axis=0)

    df.isnull() returns a dataframe with True / False values
    sum(axis=0) sums the values across all rows for a column

    0 讨论(0)
  • 2020-12-23 17:48

    If you didn't care which columns had Nan's and you just wanted to check overall, just add a second .sum() to get a single value.

    result = df.isnull().sum().sum()
    result > 0
    

    a Series would only need one .sum() and a Panel() would need three

    0 讨论(0)
  • 2020-12-23 17:49

    Both describe and info report the count of non-missing values.

    In [1]: df = DataFrame(np.random.randn(10,2))
    
    In [2]: df.iloc[3:6,0] = np.nan
    
    In [3]: df
    Out[3]: 
              0         1
    0 -0.560342  1.862640
    1 -1.237742  0.596384
    2  0.603539 -1.561594
    3       NaN  3.018954
    4       NaN -0.046759
    5       NaN  0.480158
    6  0.113200 -0.911159
    7  0.990895  0.612990
    8  0.668534 -0.701769
    9 -0.607247 -0.489427
    
    [10 rows x 2 columns]
    
    In [4]: df.describe()
    Out[4]: 
                  0          1
    count  7.000000  10.000000
    mean  -0.004166   0.286042
    std    0.818586   1.363422
    min   -1.237742  -1.561594
    25%   -0.583795  -0.648684
    50%    0.113200   0.216699
    75%    0.636036   0.608839
    max    0.990895   3.018954
    
    [8 rows x 2 columns]
    
    
    In [5]: df.info()
    <class 'pandas.core.frame.DataFrame'>
    Int64Index: 10 entries, 0 to 9
    Data columns (total 2 columns):
    0    7 non-null float64
    1    10 non-null float64
    dtypes: float64(2)
    

    To get a count of missing, your soln is correct

    In [20]: len(df.index)-df.count()
    Out[20]: 
    0    3
    1    0
    dtype: int64
    

    You could do this too

    In [23]: df.isnull().sum()
    Out[23]: 
    0    3
    1    0
    dtype: int64
    
    0 讨论(0)
  • 2020-12-23 17:50

    This isnt quite a full summary, but it will give you a quick sense of your column level data

    def getPctMissing(series):
        num = series.isnull().sum()
        den = series.count()
        return 100*(num/den)
    
    0 讨论(0)
提交回复
热议问题