Groupby with TimeGrouper 'backwards'

前端 未结 2 1613
遥遥无期
遥遥无期 2021-01-01 02:23

I have a DataFrame containing a time series:

rng = pd.date_range(\'2016-06-01\', periods=24*7, freq=\'H\')
ones = pd.Series([1]*24*7, rng)
rdf =         


        
2条回答
  •  生来不讨喜
    2021-01-01 03:18

    Since the question now focuses on grouping by week, you can simply:

    rdf.resample('W-{}'.format(rdf.index[-1].strftime('%a')), closed='right', label='right').sum()
    

    You can use loffset to get it to work - at least for most periods (using .resample()):

    for i in range(2, 7):
        print(i)
        print(rdf.resample('{}D'.format(i), closed='right', loffset='{}D'.format(i)).sum())
    
    2
                 a
    2016-06-01  24
    2016-06-03  48
    2016-06-05  48
    2016-06-07  48
    3
                 a
    2016-06-01  24
    2016-06-04  72
    2016-06-07  72
    4
                 a
    2016-06-01  24
    2016-06-05  96
    2016-06-09  48
    5
                  a
    2016-06-01   24
    2016-06-06  120
    2016-06-11   24
    6
                  a
    2016-06-01   24
    2016-06-07  144
    

    However, you could also create custom groupings that calculate the correct values without TimeGrouper like so:

    days = rdf.index.to_series().dt.day.unique()[::-1]
    for n in range(2, 7):
        chunks = [days[i:i + n] for i in range(0, len(days), n)][::-1]
        grp = pd.Series({k: v for d in [zip(chunk, [idx] * len(chunk)) for idx, chunk in enumerate(chunks)] for k, v in d})
        rdf.groupby(rdf.index.to_series().dt.day.map(grp))['a'].sum()
    
     2
    groups
    0    24
    1    48
    2    48
    3    48
    Name: a, dtype: int64
    
     3
    groups
    0    24
    1    72
    2    72
    Name: a, dtype: int64
    
     4
    groups
    0    72
    1    96
    Name: a, dtype: int64
    
     5
    groups
    0     48
    1    120
    Name: a, dtype: int64
    
     6
    groups
    0     24
    1    144
    Name: a, dtype: int64
    

提交回复
热议问题