I have a pandas dataframe containing the following data:
matchID server court speed
1 1 A 100
1 2 D 20
With groupby
, we can still use loc
to select the intended parts that we want to replace but put the whole computation within a for loop from df.groupby("matchID")
.
for id, subg in df.groupby("matchID"):
df.loc[df.matchID==id, "meanSpeedCourtA13"] = (subg
.where(subg.server.isin([1,3])).where(subg.court == "A").speed.mean())
df.loc[df.matchID==id, "meanSpeedCourtD13"] = (subg
.where(subg.server.isin([1,3])).where(subg.court == "D").speed.mean())
Specail thanks to @Dark to point it out that I was hard coding groupby
.
For loc
, it can be used to select values based on information from 2 axes: rows and columns. By convention on the documentation, the sequence to put information is rows first and columns second. For example, in df.loc[df.matchID==id, "meanSpeedCourtD13"]
, df.matchID==id
is about selecting rows that have matchID
being id
and that "meanSpeedCourtD13"
specifies a column we want to look into.
Side notes about calculating mean:
subg
where(subg.server.isin([1,3]))
then filter out server not in [1 ,3].where(subg.court == "A")
further to do filtering on court.mean
to compute mean from speed.As an alternative, you can use np.where
to assign values to each matchID in [1, 2]. This works only for binary matchID
. It is roughly the same speed with the groupby
method above tested on my computer. To save space, we only demonstrate with "meanSpeedCourtA13"
column.
# First we calculate the means
# Calculate mean for Group with mathcID being 1
meanSpeedCourtA13_ID1 = (df[df.matchID==1].
where(df.server.isin([1,3])).where(df.court == "A").speed.mean())
# Calculate mean for Group with matchID being 2
meanSpeedCourtA13_ID2 = (df[df.matchID==2].
where(df.server.isin([1,3])).where(df.court == "A").speed.mean())
# Use np.where to allocate values to each matchID in [1, 2]
df["meanSpeedCourtA13"] = np.where(df.matchID == 1,
meanSpeedCourtA13_ID1, meanSpeedCourtA13_ID2)
For np.where(condition, x, y)
, it will return x if condition is met, y otherwise. See np.where for documentation.