I am trying to create a column that does a cumulative sum using 2 columns , please see example of what I am trying to do :@Faith Akici
index lodgement_yea
If we only need to consider the column 'words', we might need to loop through unique values of the words
for unique_words in df_2.words.unique():
if 'cum_sum' not in df_2:
df_2['cum_sum'] = df_2.loc[df_2['words'] == unique_words]['sum'].cumsum()
else:
df_2.update(pd.DataFrame({'cum_sum': df_2.loc[df_2['words'] == unique_words]['sum'].cumsum()}))
above will result to:
>>> print(df_2)
lodgement_year sum words cum_sum
0 2000 14 the 14.0
1 2000 10 australia 10.0
2 2000 12 word 12.0
3 2000 8 brand 8.0
4 2000 5 fresh 5.0
5 2001 8 the 22.0
6 2001 3 australia 13.0
7 2001 1 banana 1.0
8 2001 7 brand 15.0
9 2001 1 fresh 6.0