How to count nan values in a pandas DataFrame?

前端未结

关注

 7  1762

What is the best way to account for (not a number) nan values in a pandas DataFrame?

The following code:

import numpy as np
import pandas as pd
dfd =


                      
              相关标签:


      
      
        
          7条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  情话喂你        
                
              
                            
                2020-12-18 19:09
              
            
            
                                                                       
A good clean way to count all NaN's in all columns of your dataframe would be ...

import pandas as pd 
import numpy as np


df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan]})
print(df.isna().sum().sum())


Using a single sum, you get the count of NaN's for each column. The second sum, sums those column sums.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情书的邮戳        
                
              
                            
                2020-12-18 19:16
              
            
            
                                                                       
If you want to count only NaN values in column 'a' of a DataFrame df, use:

len(df) - df['a'].count()


Here count() tells us the number of non-NaN values, and this is subtracted from the total number of values (given by len(df)).

To count NaN values in every column of df, use:

len(df) - df.count()




If you want to use value_counts, tell it not to drop NaN values by setting dropna=False (added in 0.14.1):

dfv = dfd['a'].value_counts(dropna=False)


This allows the missing values in the column to be counted too:

 3     3
NaN    2
 1     1
Name: a, dtype: int64


The rest of your code should then work as you expect (note that it's not necessary to call sum; just print("nan: %d" % dfv[np.nan]) suffices).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  南笙        
                
              
                            
                2020-12-18 19:20
              
            
            
                                                                       
dfd['a'].isnull().value_counts()


return :

(True  695
False  60,
Name: a, dtype: int64)




True : represents the null values count
False : represent the non-null values count


                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  旧时难觅i        
                
              
                            
                2020-12-18 19:22
              
            
            
                                                                       
if you only want the summary of null value for each column, using the following code
    df.isnull().sum()
if you want to know how many null values in the data frame using following code
   df.isnull().sum().sum() # calculate total
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  别那么骄傲        
                
              
                            
                2020-12-18 19:23
              
            
            
                                                                       
To count just null values, you can use isnull(): 

In [11]:
dfd.isnull().sum()

Out[11]:
a    2
dtype: int64


Here a is the column name, and there are 2 occurrences of the null value in the column.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  有刺的猬        
                
              
                            
                2020-12-18 19:25
              
            
            
                                                                       
Yet another way to count all the nans in a df:

num_nans = df.size - df.count().sum()

Timings:

import timeit

import numpy as np
import pandas as pd

df_scale = 100000
df = pd.DataFrame(
    [[1, np.nan, 100, 63], [2, np.nan, 101, 63], [2, 12, 102, 63],
     [2, 14, 102, 63], [2, 14, 102, 64], [1, np.nan, 200, 63]] * df_scale,
    columns=['group', 'value', 'value2', 'dummy'])

repeat = 3
numbers = 100

setup = """import pandas as pd
from __main__ import df
"""

def timer(statement, _setup=None):
    print (min(
        timeit.Timer(statement, setup=_setup or setup).repeat(
            repeat, numbers)))

timer('df.size - df.count().sum()')
timer('df.isna().sum().sum()')
timer('df.isnull().sum().sum()')


prints:

3.998805362999999
3.7503365439999996
3.689461442999999


so pretty much equivalent
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复