Pandas: how to sort dataframe by column AND by index

前端 未结 3 610
不思量自难忘°
不思量自难忘° 2021-01-05 17:54

Given the DataFrame:

import pandas as pd
df = pd.DataFrame([6, 4, 2, 4, 5], index=[2, 6, 3, 4, 5], columns=[\'A\'])

Results in:

<         


        
相关标签:
3条回答
  • 2021-01-05 18:37

    Using lexsort from numpy may be other way and little faster as well:

    df.iloc[np.lexsort((df.index, df.A.values))] # Sort by A.values, then by index
    

    Result:

       A
    3  2
    4  4
    6  4
    5  5
    2  6
    

    Comparing with timeit:

    %%timeit
    df.iloc[np.lexsort((df.index, df.A.values))] # Sort by A.values, then by index
    

    Result:

    1000 loops, best of 3: 278 µs per loop
    

    With reset index and set index again:

     %%timeit
    df.reset_index().sort_values(by=['A','index']).set_index('index')
    

    Result:

    100 loops, best of 3: 2.09 ms per loop
    
    0 讨论(0)
  • 2021-01-05 18:41

    The other answers are great. I'll throw in one other option, which is to provide a name for the index first using rename_axis and then reference it in sort_values. I have not tested the performance but expect the accepted answer to still be faster.

    df.rename_axis('idx').sort_values(by=['A', 'idx'])

         A
    idx   
    3    2
    4    4
    6    4
    5    5
    2    6
    

    You can clear the index name afterward if you want with df.index.name = None.

    0 讨论(0)
  • 2021-01-05 18:59

    You can sort by index and then by column A using kind='mergesort'.

    This works because mergesort is stable.

    res = df.sort_index().sort_values('A', kind='mergesort')
    

    Result:

       A
    3  2
    4  4
    6  4
    5  5
    2  6
    
    0 讨论(0)
提交回复
热议问题