问题
I have a dataframe:
import pandas as pd
df = pd.DataFrame([[1, 'a'],
[1, 'a'],
[1, 'b'],
[1, 'a'],
[2, 'a'],
[2, 'b'],
[2, 'a'],
[2, 'b'],
[3, 'b'],
[3, 'a'],
[3, 'b'],
], columns=['session', 'issue'])
df
I would like to rank issues within sessions. I tried with:
df.groupby(['session', 'issue']).size().rank(ascending=False, method='dense')
session issue
1 a 1.0
b 3.0
2 a 2.0
b 2.0
3 a 3.0
b 2.0
dtype: float64
What I need is result like this one:
- for group session=1, there are three a issues and one b issue, so for group 1, ranks are a = 1 and b = 2
- for group session=2, both ranks are equal so their rank should be the same = 1
- for group session=3, there are to b issues and one a issue, so ranks should be b=1 and a=2
Anyway, why for each group ranks don't start from 1, 2, 3...?
回答1:
Use DataFrameGroupBy.rank by first level of MultiIndex
(session
):
s = (df.groupby(['session', 'issue'])
.size()
.groupby(level=0)
.rank(ascending=False, method='dense'))
print (s)
session issue
1 a 1.0
b 2.0
2 a 1.0
b 1.0
3 a 2.0
b 1.0
dtype: float64
来源:https://stackoverflow.com/questions/54530503/pandas-groupby-and-rank-within-groups-that-start-with-1-for-each-group