问题
I have a pandas dataframe which looks like this:
df = pd.DataFrame({
'job': ['football','football', 'football', 'basketball', 'basketball', 'basketball', 'hokey', 'hokey', 'hokey', 'football','football', 'football', 'basketball', 'basketball', 'basketball', 'hokey', 'hokey', 'hokey'],
'team': [4.0,5.0,9.0,2.0,3.0,6.0,1.0,7.0,8.0, 4.0,5.0,9.0,2.0,3.0,6.0,1.0,7.0,8.0],
'cluster': [0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1]
})
Each cluster
contains 9 teams. Each cluster has 3 teams of each type of sport football
, basketball
and hokey
. I want to apply a shift-function to each cluster, so that the order of teams chance in a very specific way (I tried to highlight it with color):
How can I do this transformation (shift rows in a way shown above) for a much larger dataframe?
回答1:
Let's do groupby
+ cumcount
to create a sequential counter based on the columns cluster
and job
then use sort_values
to sort the dataframe on cluster
and this counter
:
df['j'] = df.groupby(['cluster', 'job']).cumcount()
df = df.sort_values(['cluster', 'j'], ignore_index=True).drop('j', axis=1)
job team cluster
0 football 4.0 0
1 basketball 2.0 0
2 hokey 1.0 0
3 football 5.0 0
4 basketball 3.0 0
5 hokey 7.0 0
6 football 9.0 0
7 basketball 6.0 0
8 hokey 8.0 0
9 football 4.0 1
10 basketball 2.0 1
11 hokey 1.0 1
12 football 5.0 1
13 basketball 3.0 1
14 hokey 7.0 1
15 football 9.0 1
16 basketball 6.0 1
17 hokey 8.0 1
来源:https://stackoverflow.com/questions/64194412/shift-rows-in-pandas-dataframe-in-a-specific-order