Sort a pandas's dataframe series by month name?

前端 未结 6 1708
无人及你
无人及你 2020-12-01 06:09

I have a Series object that has:

    date   price
    dec      12
    may      15
    apr      13
    ..

Problem statement:

相关标签:
6条回答
  • 2020-12-01 06:23

    use Sort_Dataframeby_Month function to sort month names in chronological order

    Packages need to install.

    $ pip install sorted-months-weekdays
    $ pip install sort-dataframeby-monthorweek
    

    example:

    from sorted_months_weekdays import *
    
    from sort_dataframeby_monthorweek import *
    
    df = pd.DataFrame([['Jan',23],['Jan',16],['Dec',35],['Apr',79],['Mar',53],['Mar',12],['Feb',3]], columns=['Month','Sum'])
    df
    Out[11]: 
      Month  Sum
    0   Jan   23
    1   Jan   16
    2   Dec   35
    3   Apr   79
    4   Mar   53
    5   Mar   12
    6   Feb    3
    

    To sort dataframe by Month use below function

    Sort_Dataframeby_Month(df=df,monthcolumnname='Month')
    Out[14]: 
      Month  Sum
    0   Jan   23
    1   Jan   16
    2   Feb    3
    3   Mar   53
    4   Mar   12
    5   Apr   79
    6   Dec   35
    
    0 讨论(0)
  • 2020-12-01 06:24

    Thanks @Brad Solomon for offering a faster way to capitalize string!

    Note 1 @Brad Solomon's answer using pd.categorical should save your resources more than my answer. He showed how to assign order to your categorical data. You should not miss it :P

    Alternatively, you can use.

    df = pd.DataFrame([["dec", 12], ["jan", 40], ["mar", 11], ["aug", 21],
                      ["aug", 11], ["jan", 11], ["jan", 1]], 
                       columns=["Month", "Price"])
    # Preprocessing: capitalize `jan`, `dec` to `Jan` and `Dec`
    df["Month"] = df["Month"].str.capitalize()
    
    # Now the dataset should look like
    #   Month Price
    #   -----------
    #    Dec    XX
    #    Jan    XX
    #    Apr    XX
    
    # make it a datetime so that we can sort it: 
    # use %b because the data use the abbriviation of month
    df["Month"] = pd.to_datetime(df.Month, format='%b', errors='coerce').dt.month
    df = df.sort_values(by="Month")
    
    total = (df.groupby(df['Month"])['Price'].mean())
    
    # total 
    Month
    1     17.333333
    3     11.000000
    8     16.000000
    12    12.000000
    

    Note 2 groupby by default will sort group keys for you. Be aware to use the same key to sort and groupby in the df = df.sort_values(by=SAME_KEY) and total = (df.groupby(df[SAME_KEY])['Price'].mean()). Otherwise, one may gets unintended behavior. See Groupby preserve order among groups? In which way? for more information.

    Note 3 A more computationally efficient way is first compute mean and then do sorting on months. In this way, you only need to sort on 12 items rather than the whole df. It will reduce the computational cost if one don't need df to be sorted.

    Note 4 For people already have month as index, and wonder how to make it categorical, take a look at pandas.CategoricalIndex @jezrael has a working example on making categorical index ordered in Pandas series sort by month index

    0 讨论(0)
  • 2020-12-01 06:32

    You can add the numerical month value together with the name in the index (i.e "01 January"), do a sort then strip off the number:

    total=(df.groupby(df['date'].dt.strftime('%m %B'))['price'].mean()).sort_index()
    

    It may look sth like this:

    01 January  xxx
    02 February     yyy
    03 March    zzz
    04 April    ttt
    
     total.index = [ x.split()[1] for x in total.index ]
    
    January xxx
    February yyy
    March zzz
    April ttt
    
    0 讨论(0)
  • 2020-12-01 06:39

    You can use categorical data to enable proper sorting:

    months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", 
              "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]
    df['months'] = pd.Categorical(df['months'], categories=months, ordered=True)
    df.sort_values(...)  # same as you have now; can use inplace=True
    

    When you specify the categories, pandas remembers the order of specification as the default sort order.

    Docs: Pandas categories > sorting & order.

    0 讨论(0)
  • 2020-12-01 06:40

    I would use the calender module and reindex:

    series.str.capitalize helps capitalizing the series , then we create a dictionary with the calender module and map with the series to get month number.

    Once we have the month number we can sort_values() and get the index. Then reindex .

    import calendar
    df.date=df.date.str.capitalize() #capitalizes the series
    d={i:e for e,i in enumerate(calendar.month_abbr)} #creates a dictionary
    #d={i[:3]:e for e,i in enumerate(calendar.month_name)} 
    df.reindex(df.date.map(d).sort_values().index) #map + sort_values + reindex with index
    

      date  price
    2  Apr     13
    1  May     15
    0  Dec     12
    
    0 讨论(0)
  • 2020-12-01 06:42

    You should consider re-indexing it based on axis 0 (indexes)

    new_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
    
    df1 = df.reindex(new_order, axis=0)
    
    0 讨论(0)
提交回复
热议问题