Grouping DataFrame by start of decade using pandas Grouper

后端 未结 4 615
死守一世寂寞
死守一世寂寞 2021-01-12 15:05

I have a dataframe of daily observations from 01-01-1973 to 12-31-2014.

Have been using Pandas Grouper and everything has worked fine for each frequency until now:

相关标签:
4条回答
  • 2021-01-12 15:20

    pd.cut also works to specify a regular frequency with a specified start year.

    import pandas as pd
    df
                     date  val
    0 1970-01-01 00:01:18    1
    1 1979-12-31 18:01:01   12
    2 1980-01-01 00:00:00    2
    3 1989-01-01 00:00:00    3
    4 2014-05-06 00:00:00    4
    
    df.groupby(pd.cut(df.date, pd.date_range('1970', '2020', freq='10YS'), right=False)).mean()
    #                          val
    #date                         
    #[1970-01-01, 1980-01-01)  6.5
    #[1980-01-01, 1990-01-01)  2.5
    #[1990-01-01, 2000-01-01)  NaN
    #[2000-01-01, 2010-01-01)  NaN
    #[2010-01-01, 2020-01-01)  4.0
    
    0 讨论(0)
  • 2021-01-12 15:22

    Something like

    df.groupby(df.index.astype(str).str[:2]+'0').mean()
    
    0 讨论(0)
  • 2021-01-12 15:24

    You can do a little arithmetic on the year to floor it to the nearest decade:

    df.groupby(df.index.year // 10 * 10).mean()
    
    0 讨论(0)
  • 2021-01-12 15:45

    @cᴏʟᴅsᴘᴇᴇᴅ's method is cleaner then this, but keeping your pd.Grouper method, one way to do this is to merge your data with a new date range that starts at the beginning of a decade and ends at the end of a decade, then use your Grouper on that. For example, given an initial df:

            date      data
    0     1973-01-01 -1.097895
    1     1973-01-02  0.834253
    2     1973-01-03  0.134698
    3     1973-01-04 -1.211177
    4     1973-01-05  0.366136
    ...
    15335 2014-12-27 -0.566134
    15336 2014-12-28 -1.100476
    15337 2014-12-29  0.115735
    15338 2014-12-30  1.635638
    15339 2014-12-31  1.930645
    

    Merge that with a date_range dataframe ranging from 1980 to 2020:

    new_df = pd.DataFrame({'date':pd.date_range(start='01-01-1970', end='12-31-2019', freq='D')})
    
    df = new_df.merge(df, on ='date', how='left')
    

    And use your Grouper:

    df.groupby(pd.Grouper(key='date', freq = '10AS')).mean()
    

    Which gives you:

                    data
    date                
    1970-01-01 -0.005455
    1980-01-01  0.028066
    1990-01-01  0.011122
    2000-01-01  0.011213
    2010-01-01  0.029592
    

    The same, but in one go, could look like this:

    (df.merge(pd.DataFrame(
        {'date':pd.date_range(start='01-01-1970',
                              end='12-31-2019',
                              freq='D')}),
              how='right')
     .groupby(pd.Grouper(key='date', freq = '10AS'))
     .mean())
    
    0 讨论(0)
提交回复
热议问题