I am trying to create a column that does a cumulative sum using 2 columns , please see example of what I am trying to do :@Faith Akici
index lodgement_yea
You are almost there, Ian!
cumsum()
method calculates the cumulative sum of a Pandas column. You are looking for that applied to the grouped words
. Therefore:
In [303]: df_2['cumsum'] = df_2.groupby(['words'])['sum'].cumsum()
In [304]: df_2
Out[304]:
index lodgement_year words sum cum_sum cumsum
0 0 2000 the 14 14 14
1 1 2000 australia 10 10 10
2 2 2000 word 12 12 12
3 3 2000 brand 8 8 8
4 4 2000 fresh 5 5 5
5 5 2001 the 8 22 22
6 6 2001 australia 3 13 13
7 7 2001 banana 1 1 1
8 8 2001 brand 7 15 15
9 9 2001 fresh 1 6 6
Please comment if this fails on your bigger data set, and we'll work on a possibly more accurate version of this.