Sort rows of a dataframe in descending order of NaN counts

前端未结

关注

 4  2036

I\'m trying to sort the following Pandas DataFrame:

         RHS  age  height  shoe_size  weight
0     weight  NaN     0.0        0.0     1.0
1  shoe_size  N


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  没有蜡笔的小新        
                
              
                            
                2021-01-13 12:51
              
            
            
                                                                       
You can add a column of the number of null values, sort by that column, then drop the column.  It's up to you if you want to use .reset_index(drop=True) to reset the row count.

df['null_count'] = df.isnull().sum(axis=1)
df.sort_values('null_count', ascending=False).drop('null_count', axis=1)

# returns
         RHS  age  height  shoe_size  weight
1  shoe_size  NaN     0.0        1.0     NaN
0     weight  NaN     0.0        0.0     1.0
2  shoe_size  3.0     0.0        0.0     NaN
3     weight  3.0     0.0        0.0     1.0
4        age  3.0     0.0        0.0     1.0

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  小蘑菇        
                
              
                            
                2021-01-13 13:05
              
            
            
                                                                       
Using df.sort_values and loc based accessing. 

df = df.iloc[df.isnull().sum(1).sort_values(ascending=0).index]
print(df)

         RHS  age  height  shoe_size  weight
1  shoe_size  NaN     0.0        1.0     NaN
2  shoe_size  3.0     0.0        0.0     NaN
0     weight  NaN     0.0        0.0     1.0
4        age  3.0     0.0        0.0     1.0
3     weight  3.0     0.0        0.0     1.0


df.isnull().sum(1) counts the NaNs and the rows are accessed based on this sorted count.



@ayhan offered a nice little improvement to the solution above, involving pd.Series.argsort:

df = df.iloc[df.isnull().sum(axis=1).mul(-1).argsort()]
print(df)

         RHS  age  height  shoe_size  weight 
1  shoe_size  NaN     0.0        1.0     NaN           
0     weight  NaN     0.0        0.0     1.0           
2  shoe_size  3.0     0.0        0.0     NaN           
3     weight  3.0     0.0        0.0     1.0           
4        age  3.0     0.0        0.0     1.0            

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  陌清茗        
                
              
                            
                2021-01-13 13:06
              
            
            
                                                                       
Here's a one-liner that will do it:

df.assign(Count_NA = lambda x: x.isnull().sum(axis=1)).sort_values('Count_NA', ascending=False).drop('Count_NA', axis=1)
#          RHS  age  height  shoe_size  weight
# 1  shoe_size  NaN     0.0        1.0     NaN
# 0     weight  NaN     0.0        0.0     1.0
# 2  shoe_size  3.0     0.0        0.0     NaN
# 3     weight  3.0     0.0        0.0     1.0
# 4        age  3.0     0.0        0.0     1.0


This works by assigning a temporary column ("Count_NA") to count the NAs in each row, sorting on that column, and then dropping it, all in the same expression.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一向        
                
              
                            
                2021-01-13 13:06
              
            
            
                                                                       
df.isnull().sum().sort_values(ascending=False)
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复