Pandas DataFrame sort by categorical column but by specific class ordering

前端 未结 3 1028
醉话见心
醉话见心 2020-12-03 18:52

I would like to select the top entries in a Pandas dataframe base on the entries of a specific column by using df_selected = df_targets.head(N).

Each

相关标签:
3条回答
  • 2020-12-03 19:10

    The method shown in my previous answer is now deprecated.

    In stead it is best to use pandas.Categorical as shown here.

    So:

    list_ordering = ["Likely Supporter","GOTV","Persuasion","Persuasion+GOTV"]  
    df["target"] = pd.Categorical(df["target"], categories=list_ordering) 
    
    0 讨论(0)
  • 2020-12-03 19:13

    Thanks to jerzrael's input and references,

    I like this sliced solution:

    list_ordering = ["Likely Supporter","GOTV","Persuasion","Persuasion+GOTV"]  
    
    df["target"] = df["target"].astype("category", categories=list_ordering, ordered=True)
    
    0 讨论(0)
  • 2020-12-03 19:17

    I think you need Categorical with parameter ordered=True and then sorting by sort_values works very nice:

    If check documentation of Categorical:

    Ordered Categoricals can be sorted according to the custom order of the categories and can have a min and max value.

    import pandas as pd
    
    df = pd.DataFrame({'a': ['GOTV', 'Persuasion', 'Likely Supporter', 
                             'GOTV', 'Persuasion', 'Persuasion+GOTV']})
    
    df.a = pd.Categorical(df.a, 
                          categories=["Likely Supporter","GOTV","Persuasion","Persuasion+GOTV"],
                          ordered=True)
    
    print (df)
                      a
    0              GOTV
    1        Persuasion
    2  Likely Supporter
    3              GOTV
    4        Persuasion
    5   Persuasion+GOTV
    
    print (df.a)
    0                GOTV
    1          Persuasion
    2    Likely Supporter
    3                GOTV
    4          Persuasion
    5     Persuasion+GOTV
    Name: a, dtype: category
    Categories (4, object): [Likely Supporter < GOTV < Persuasion < Persuasion+GOTV]
    
    df.sort_values('a', inplace=True)
    print (df)
                      a
    2  Likely Supporter
    0              GOTV
    3              GOTV
    1        Persuasion
    4        Persuasion
    5   Persuasion+GOTV
    
    0 讨论(0)
提交回复
热议问题