问题
I have a pandas dataframe that looks as follows:
ID round player1 player2
1 1 A B
1 2 A C
1 3 B D
2 1 B C
2 2 C D
2 3 C E
3 1 B C
3 2 C D
3 3 C A
The dataframe contains sport match results, where the ID
column denotes one tournament, the round
column denotes the round for each tournament, and player1
and player2
columns contain the names of players that played against eachother in the respective round
.
I now want to cumulatively count the tournament participations for, say, player A
. In pseudocode this means: If the player with name A
comes up in either the player1
or player2
column per tournament ID
, increment the counter by 1.
The result should look like this (note: in my example player A
did participate in tournaments with the ID
s 1 and 3):
ID round player1 player2 playerAparticipated
1 1 A B 1
1 2 A C 1
1 3 B D 1
2 1 B C 0
2 2 C D 0
2 3 C E 0
3 1 B C 2
3 2 C D 2
3 3 C A 2
My current status is, that I added a "helper" column containing the values 1
or 0
denoting, if the respective player participated in the tournament:
ID round player1 player2 helper
1 1 A B 1
1 2 A C 1
1 3 B D 1
2 1 B C 0
2 2 C D 0
2 3 C E 0
3 1 B C 1
3 2 C D 1
3 3 C A 1
I think that I just need one final step, e.g., a smart use of cumsum()
that counts the helper
column in the desired way. However, I could not come up with the solution yet.
回答1:
I think you need:
- drop_duplicates by column
ID
first and then set_index - filter out
0
values by boolean indexing, cumsum and last reindex for add0
for missing index values - new column create by map
df1 = df.drop_duplicates('ID').set_index('ID')
s = df1.loc[df1['helper'] != 0, 'helper'].cumsum().reindex(index=df1.index, fill_value=0)
df['playerAparticipated'] = df['ID'].map(s)
print (df)
ID round player1 player2 helper playerAparticipated
0 1 1 A B 1 1
1 1 2 A C 1 1
2 1 3 B D 1 1
3 2 1 B C 0 0
4 2 2 C D 0 0
5 2 3 C E 0 0
6 3 1 B C 1 2
7 3 2 C D 1 2
8 3 3 C A 1 2
Instead map
is possible use join with rename
:
df = df.join(s.rename('playerAparticipated'), on='ID')
print (df)
ID round player1 player2 helper playerAparticipated
0 1 1 A B 1 1
1 1 2 A C 1 1
2 1 3 B D 1 1
3 2 1 B C 0 0
4 2 2 C D 0 0
5 2 3 C E 0 0
6 3 1 B C 1 2
7 3 2 C D 1 2
8 3 3 C A 1 2
回答2:
A similar approach to @jezrael that I cooked up a little slower :).
First, move ID
into your index:
df = df.reset_index().set_index(['index','ID'])
# round player1 player2 helper
# index ID
# 0 1 1 A B 1
# 1 1 2 A C 1
# 2 1 3 B D 1
# 3 2 1 B C 0
# 4 2 2 C D 0
# 5 2 3 C E 0
# 6 3 1 B C 1
# 7 3 2 C D 1
# 8 3 3 C A 1
Next, filter out rows where helper
is 0 and get a cumulative sum of tournaments by ID
, and assign the result to a variable:
tournament_count = df[df['helper'] > 0].groupby(['ID','helper']).first().reset_index(level=1)['helper'].cumsum().rename("playerAparticipated")
# ID
# 1 1
# 3 2
Finally, join the tournament_count
DataFrame with df:
df.join(tournament_counts, how="left").fillna(0)
# round player1 player2 helper tournament_counts
# index ID
# 0 1 1 A B 1 1.0
# 1 1 2 A C 1 1.0
# 2 1 3 B D 1 1.0
# 3 2 1 B C 0 0.0
# 4 2 2 C D 0 0.0
# 5 2 3 C E 0 0.0
# 6 3 1 B C 1 2.0
# 7 3 2 C D 1 2.0
# 8 3 3 C A 1 2.0
来源:https://stackoverflow.com/questions/45119623/pandas-count-over-groups