Pandas nested sort and NaN

前端 未结 1 597
一向
一向 2021-01-02 09:32

I\'m trying to understand the expected behavior of DataFrame.sort on columns with NaN values.

Given this DataFrame:

In [36]: df
Out[36]: 
    a   b
0         


        
相关标签:
1条回答
  • 2021-01-02 10:21

    Until fixed in Pandas, this is what I'm using for sorting for my needs, with a subset of the functionality of the original DataFrame.sort function. This will work for numerical values only:

    def dataframe_sort(df, columns, ascending=True):
        a = np.array(df[columns])
    
        # ascending/descending array - -1 if descending, 1 if ascending
        if isinstance(ascending, bool):
            ascending = len(columns) * [ascending]
        ascending = map(lambda x: x and 1 or -1, ascending)
    
        ind = np.lexsort([ascending[i] * a[:, i] for i in reversed(range(len(columns)))])
        return df.iloc[[ind]]
    

    Usage example:

    In [4]: df
    Out[4]: 
         a   b   c
    10   1   9   7
    11 NaN NaN   1
    12   2 NaN   6
    13 NaN   5   6
    14   1   2   6
    15   6   5 NaN
    16   8   4   4
    17   4   5   3
    
    In [5]: dataframe_sort(df, ['a', 'c'], False)
    Out[5]: 
         a   b   c
    16   8   4   4
    15   6   5 NaN
    17   4   5   3
    12   2 NaN   6
    10   1   9   7
    14   1   2   6
    13 NaN   5   6
    11 NaN NaN   1
    
    In [6]: dataframe_sort(df, ['b', 'a'], [False, True])
    Out[6]: 
         a   b   c
    10   1   9   7
    17   4   5   3
    15   6   5 NaN
    13 NaN   5   6
    16   8   4   4
    14   1   2   6
    12   2 NaN   6
    11 NaN NaN   1
    
    0 讨论(0)
提交回复
热议问题