Sort rows of a dataframe in descending order of NaN counts

前端 未结 4 2027
栀梦
栀梦 2021-01-13 12:41

I\'m trying to sort the following Pandas DataFrame:

         RHS  age  height  shoe_size  weight
0     weight  NaN     0.0        0.0     1.0
1  shoe_size  N         


        
相关标签:
4条回答
  • 2021-01-13 12:51

    You can add a column of the number of null values, sort by that column, then drop the column. It's up to you if you want to use .reset_index(drop=True) to reset the row count.

    df['null_count'] = df.isnull().sum(axis=1)
    df.sort_values('null_count', ascending=False).drop('null_count', axis=1)
    
    # returns
             RHS  age  height  shoe_size  weight
    1  shoe_size  NaN     0.0        1.0     NaN
    0     weight  NaN     0.0        0.0     1.0
    2  shoe_size  3.0     0.0        0.0     NaN
    3     weight  3.0     0.0        0.0     1.0
    4        age  3.0     0.0        0.0     1.0
    
    0 讨论(0)
  • 2021-01-13 13:05

    Using df.sort_values and loc based accessing.

    df = df.iloc[df.isnull().sum(1).sort_values(ascending=0).index]
    print(df)
    
             RHS  age  height  shoe_size  weight
    1  shoe_size  NaN     0.0        1.0     NaN
    2  shoe_size  3.0     0.0        0.0     NaN
    0     weight  NaN     0.0        0.0     1.0
    4        age  3.0     0.0        0.0     1.0
    3     weight  3.0     0.0        0.0     1.0
    

    df.isnull().sum(1) counts the NaNs and the rows are accessed based on this sorted count.


    @ayhan offered a nice little improvement to the solution above, involving pd.Series.argsort:

    df = df.iloc[df.isnull().sum(axis=1).mul(-1).argsort()]
    print(df)
    
             RHS  age  height  shoe_size  weight 
    1  shoe_size  NaN     0.0        1.0     NaN           
    0     weight  NaN     0.0        0.0     1.0           
    2  shoe_size  3.0     0.0        0.0     NaN           
    3     weight  3.0     0.0        0.0     1.0           
    4        age  3.0     0.0        0.0     1.0            
    
    0 讨论(0)
  • 2021-01-13 13:06

    Here's a one-liner that will do it:

    df.assign(Count_NA = lambda x: x.isnull().sum(axis=1)).sort_values('Count_NA', ascending=False).drop('Count_NA', axis=1)
    #          RHS  age  height  shoe_size  weight
    # 1  shoe_size  NaN     0.0        1.0     NaN
    # 0     weight  NaN     0.0        0.0     1.0
    # 2  shoe_size  3.0     0.0        0.0     NaN
    # 3     weight  3.0     0.0        0.0     1.0
    # 4        age  3.0     0.0        0.0     1.0
    

    This works by assigning a temporary column ("Count_NA") to count the NAs in each row, sorting on that column, and then dropping it, all in the same expression.

    0 讨论(0)
  • 2021-01-13 13:06

    df.isnull().sum().sort_values(ascending=False)

    0 讨论(0)
提交回复
热议问题