DateOccurred CostCentre TimeDifference
03/09/2012 2073 28138
03/09/2012 6078 34844
03/09/2012 8273 31215
03/09/2012 8367 28160
03/09/2012 8959
Perhaps group by CostCentre first, then use Series/DataFrame resample()
?
In [72]: centers = {}
In [73]: for center, idx in df.groupby("CostCentre").groups.iteritems():
....: timediff = df.ix[idx].set_index("Date")['TimeDifference']
....: centers[center] = timediff.resample("W", how=sum)
In [77]: pd.concat(centers, names=['CostCentre'])
Out[77]:
CostCentre Date
0 2012-09-09 0
2012-09-16 89522
2012-09-23 6
2012-09-30 161
2073 2012-09-09 141208
2012-09-16 113024
2012-09-23 169599
2012-09-30 170780
6078 2012-09-09 171481
2012-09-16 160871
2012-09-23 153976
2012-09-30 122972
Additional details:
When parse_dates
is True
for the pd.read_* functions, index_col
must also be set.
In [28]: df = pd.read_clipboard(sep=' +', parse_dates=True, index_col=0,
....: dayfirst=True)
In [30]: df.head()
Out[30]:
CostCentre TimeDifference
DateOccurred
2012-09-03 2073 28138
2012-09-03 6078 34844
2012-09-03 8273 31215
2012-09-03 8367 28160
2012-09-03 8959 32037
Since resample() requires a TimeSeries-indexed frame/series, setting the index during creation eliminates the need to set the index for each group individually. GroupBy objects also have an apply method, which is basically syntactic sugar around the "combine" step done with pd.concat() above.
In [37]: x = df.groupby("CostCentre").apply(lambda df:
....: df['TimeDifference'].resample("W", how=sum))
In [38]: x.head(12)
Out[38]:
CostCentre DateOccurred
0 2012-09-09 0
2012-09-16 89522
2012-09-23 6
2012-09-30 161
2073 2012-09-09 141208
2012-09-16 113024
2012-09-23 169599
2012-09-30 170780
6078 2012-09-09 171481
2012-09-16 160871
2012-09-23 153976
2012-09-30 122972
Here's a way to take your input (as text) and group it the way you want. The key is to use a dictionary for each grouping (date, then centre).
import collections
import datetime
import functools
def delta_totals_by_date_and_centre(in_file):
# Use a defaultdict instead of a normal dict so that missing values are
# automatically created. by_date is a mapping (dict) from a tuple of (year, week)
# to another mapping (dict) from centre to total delta time.
by_date = collections.defaultdict(functools.partial(collections.defaultdict, int))
# For each line in the input...
for line in in_file:
# Parse the three fields of each line into date, int ,int.
date, centre, delta = line.split()
date = datetime.datetime.strptime(date, "%d/%m/%Y").date()
centre = int(centre)
delta = int(delta)
# Determine the year and week of the year.
year, week, weekday = date.isocalendar()
year_and_week = year, week
# Add the time delta.
by_date[year_and_week][centre] += delta
# Yield each result, in order.
for year_and_week, by_centre in sorted(by_date.items()):
for centre, delta in sorted(by_centre.items()):
yield year_and_week, centre, delta
For your sample input, it produces this output (where the first column is year-week_of_the_year
).
2012-36 0 0
2012-36 2073 141208
2012-36 6078 171481
2012-36 7042 27129
2012-36 7569 124600
2012-36 8239 82153
2012-36 8273 154517
2012-36 8367 113339
2012-36 8959 82770
2012-36 9292 128089
2012-36 9532 137491
2012-36 9705 146321
2012-36 10085 151483
2012-36 10220 87496
2012-36 14573 186
2012-37 0 89522
2012-37 2073 113024
2012-37 6078 160871
2012-37 7042 35063
2012-37 7097 30866
2012-37 8239 61744
2012-37 8273 153898
2012-37 8367 93564
2012-37 8959 116727
2012-37 9292 132628
2012-37 9532 121462
2012-37 9705 139992
2012-37 10085 111229
2012-37 10220 91245
2012-38 0 6
2012-38 2073 169599
2012-38 6078 153976
2012-38 7097 34909
2012-38 7569 152958
2012-38 8239 122693
2012-38 8273 119536
2012-38 8367 116157
2012-38 8959 75579
2012-38 9292 128340
2012-38 9532 163278
2012-38 9705 95205
2012-38 10085 94284
2012-38 10220 92318
2012-38 14573 468
2012-39 0 161
2012-39 2073 170780
2012-39 6078 122972
2012-39 7042 34953
2012-39 7097 63475
2012-39 7569 92371
2012-39 8239 194048
2012-39 8273 123332
2012-39 8367 115365
2012-39 8959 104609
2012-39 9292 131369
2012-39 9532 143933
2012-39 9705 123107
2012-39 10085 129276
2012-39 10220 124681