问题
I have spent a few hours now trying to do a "cumulative group by sum" on a pandas dataframe. I have looked at all the stackoverflow answers and surprisingly none of them can solve my (very elementary) problem:
I have a dataframe:
df1
Out[8]:
Name Date Amount
0 Jack 2016-01-31 10
1 Jack 2016-02-29 5
2 Jack 2016-02-29 8
3 Jill 2016-01-31 10
4 Jill 2016-02-29 5
I am trying to
- group by ['Name','Date'] and
- cumsum 'Amount'.
- That is it.
So the desired output is:
df1
Out[10]:
Name Date Cumsum
0 Jack 2016-01-31 10
1 Jack 2016-02-29 23
2 Jill 2016-01-31 10
3 Jill 2016-02-29 15
EDIT: I am simplifying the question. With the current answers I still can't get the correct "running" cumsum. Look closely, I want to see the cumulative sum "10, 23, 10, 15". In words, I want to see, at every consecutive date, the total cumulative sum for a person. NB: If there are two entries on one date for the same person, I want to sum those and then add them to the running cumsum and only then print the sum.
回答1:
You need assign output to new column and then remove Amount
column by drop:
df1['Cumsum'] = df1.groupby(by=['Name','Date'])['Amount'].cumsum()
df1 = df1.drop('Amount', axis=1)
print (df1)
Name Date Cumsum
0 Jack 2016-01-31 10
1 Jack 2016-02-29 5
2 Jack 2016-02-29 13
3 Jill 2016-01-31 10
4 Jill 2016-02-29 5
Another solution with assign:
df1 = df1.assign(Cumsum=df1.groupby(by=['Name','Date'])['Amount'].cumsum())
.drop('Amount', axis=1)
print (df1)
Name Date Cumsum
0 Jack 2016-01-31 10
1 Jack 2016-02-29 5
2 Jack 2016-02-29 13
3 Jill 2016-01-31 10
4 Jill 2016-02-29 5
EDIT by comment:
First groupby
columns Name
and Date
and aggregate sum
, then groupby
by level
Name
and aggregate cumsum
.
df = df1.groupby(by=['Name','Date'])['Amount'].sum()
.groupby(level='Name').cumsum().reset_index(name='Cumsum')
print (df)
Name Date Cumsum
0 Jack 2016-01-31 10
1 Jack 2016-02-29 23
2 Jill 2016-01-31 10
3 Jill 2016-02-29 15
回答2:
Set the index first, then groupby.
df.set_index(['Name', 'Date']).groupby(level=[0, 1]).Amount.cumsum().reset_index()
After the OP changed their question, this is now the correct answer.
df1.groupby(
['Name','Date']
)Amount.sum().groupby(
level='Name'
).cumsum()
This is the same answer provided by jezrael
来源:https://stackoverflow.com/questions/41808951/pandas-group-by-cumsum-keep-columns