How to sort pandas dataframe by custom order on string index

前端 未结 3 2042
北恋
北恋 2020-12-15 22:32

I have the following data frame:

import pandas as pd

# Create DataFrame
df = pd.DataFrame(
{\'id\':[2967, 5335, 13950, 6141, 6169],\\
 \'Player\': [\'Cedric         


        
相关标签:
3条回答
  • 2020-12-15 22:54

    Just reindex

    df.reindex(reorderlist)
    Out[89]: 
                     Age   G   Tm  Year     id
    Player                                    
    Maurice Baker     25   7  VAN  2004   5335
    Adrian Caldwell   31  81  DAL  1997   6169
    Ratko Varda       22  60  TOT  2001  13950
    Ryan Bowen        34  52  OKC  2009   6141
    Cedric Hunter     27   6  CHH  1991   2967
    
    0 讨论(0)
  • 2020-12-15 23:02

    To get a custom sort-order on your list of strings, declare it as a categorical and manually specify that order in a sort:

    player_order = pd.Categorical([ 'Maurice Baker', 'Adrian Caldwell','Ratko Varda' ,'Ryan Bowen' ,'Cedric Hunter'],
                  ordered=True)
    

    This is since pandas does not yet allow Categoricals as indices: df.set_index(keys=player_order, inplace=True) TypeError: unhashable type: 'Categorical'

    So you'll want to do a manual custom sort using df.sort_index(level=player_order)

    0 讨论(0)
  • 2020-12-15 23:13

    As of Pandas 1.1 DataFrame.sort_values has a key param that takes a callable to control sorting. So you could use an approach like the following:

    def sorter(column):
        reorder = [
            "Maurice Baker",
            "Adrian Caldwell",
            "Ratko Varda",
            "Ryan Bowen",
            "Cedric Hunter",
        ]
        # This also works:
        # mapper = {name: order for order, name in enumerate(reorder)}
        # return column.map(mapper)
        cat = pd.Categorical(column, categories=reorder, ordered=True)
        return pd.Series(cat)
    
    df_sorted = df.sort_values(by="Player", key=sorter)
    

    There may be some practical differences between using pd.Categorical and the column.map alternative I put in the comments. For example, see these caveats. I'm showing both for completeness. I also haven't tested how this compares performance-wise to the current accepted solution that uses df.reindex. The best approach might be different when you have a MultiIndex in play too.

    0 讨论(0)
提交回复
热议问题