Python Pandas average based on condition into new column

前端 未结 3 1082
猫巷女王i
猫巷女王i 2021-02-09 16:49

I have a pandas dataframe containing the following data:

matchID    server    court    speed
1          1         A         100
1          2         D         20         


        
相关标签:
3条回答
  • 2021-02-09 17:34

    Ok this got a bit more complicated. Normally I'd try something with transform but I'd be glad if someone had something better than the following:

    Use groupby and send df to func where df.loc is used, lastly use pd.concat to glue the dataframe together again:

    import pandas as pd
    
    data = {'matchID': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 2, 9: 2, 10: 2, 
                        11: 2, 12: 2, 13: 2, 14: 2, 15: 2}, 
    'court': {0: 'A', 1: 'D', 2: 'D', 3: 'A', 4: 'A', 5: 'A', 6: 'D', 7: 'D', 8: 'A',
              9: 'D', 10: 'D', 11: 'A', 12: 'A', 13: 'A', 14: 'D', 15: 'D'}, 
    'speed': {0: 100, 1: 200, 2: 300, 3: 100, 4: 120, 5: 250, 6: 110, 7: 100, 8: 100, 
              9: 200, 10: 300, 11: 100, 12: 120, 13: 250, 14: 110, 15: 100}, 
    'server': {0: 1, 1: 2, 2: 3, 3: 4, 4: 1, 5: 2, 6: 3, 7: 4, 8: 1, 9: 2, 10: 3, 
               11: 4, 12: 1, 13: 2, 14: 3, 15: 4}}
    
    df = pd.DataFrame(data)
    
    def func(dfx):
        dfx['meanSpeedCourtA13'],dfx['meanSpeedCourtD13'] = \
         (dfx.loc[(dfx.server.isin((1,3))) & (dfx.court == 'A'),'speed'].mean(),
          dfx.loc[(dfx.server.isin((1,3))) & (dfx.court == 'D'),'speed'].mean())
        return dfx
    
    newdf = pd.concat(func(dfx) for _, dfx in df.groupby('matchID'))
    
    print(newdf)
    

    Returns

       court  matchID  server  speed  meanSpeedCourtA13  meanSpeedCourtD13
    0      A        1       1    100             110.00             205.00
    1      D        1       2    200             110.00             205.00
    2      D        1       3    300             110.00             205.00
    3      A        1       4    100             110.00             205.00
    4      A        1       1    120             110.00             205.00
    5      A        1       2    250             110.00             205.00
    6      D        1       3    110             110.00             205.00
    7      D        1       4    100             110.00             205.00
    8      A        2       1    100             110.00             205.00
    9      D        2       2    200             110.00             205.00
    10     D        2       3    300             110.00             205.00
    11     A        2       4    100             110.00             205.00
    12     A        2       1    120             110.00             205.00
    13     A        2       2    250             110.00             205.00
    14     D        2       3    110             110.00             205.00
    15     D        2       4    100             110.00             205.00
    
    0 讨论(0)
  • 2021-02-09 17:51

    With groupby, we can still use loc to select the intended parts that we want to replace but put the whole computation within a for loop from df.groupby("matchID").

    for id, subg in df.groupby("matchID"):       
        df.loc[df.matchID==id, "meanSpeedCourtA13"] = (subg
                  .where(subg.server.isin([1,3])).where(subg.court == "A").speed.mean())
        df.loc[df.matchID==id, "meanSpeedCourtD13"] = (subg
                  .where(subg.server.isin([1,3])).where(subg.court == "D").speed.mean())
    

    Specail thanks to @Dark to point it out that I was hard coding groupby.

    For loc, it can be used to select values based on information from 2 axes: rows and columns. By convention on the documentation, the sequence to put information is rows first and columns second. For example, in df.loc[df.matchID==id, "meanSpeedCourtD13"], df.matchID==id is about selecting rows that have matchID being id and that "meanSpeedCourtD13" specifies a column we want to look into.

    Side notes about calculating mean:

    • for each group subg
    • where(subg.server.isin([1,3])) then filter out server not in [1 ,3].
    • where(subg.court == "A") further to do filtering on court.
    • finally call mean to compute mean from speed.

    As an alternative, you can use np.where to assign values to each matchID in [1, 2]. This works only for binary matchID. It is roughly the same speed with the groupby method above tested on my computer. To save space, we only demonstrate with "meanSpeedCourtA13" column.

    # First we calculate the means
    # Calculate mean for Group with mathcID being 1
    meanSpeedCourtA13_ID1 = (df[df.matchID==1].
                     where(df.server.isin([1,3])).where(df.court == "A").speed.mean())    
    # Calculate mean for Group with matchID being 2
    meanSpeedCourtA13_ID2 = (df[df.matchID==2].
                     where(df.server.isin([1,3])).where(df.court == "A").speed.mean())
    # Use np.where to allocate values to each matchID in [1, 2]
    df["meanSpeedCourtA13"] = np.where(df.matchID == 1,
                                       meanSpeedCourtA13_ID1, meanSpeedCourtA13_ID2)
    

    For np.where(condition, x, y), it will return x if condition is met, y otherwise. See np.where for documentation.

    0 讨论(0)
  • 2021-02-09 17:52

    You can get the mean by groupby and assign the values by getting the item() i.e

    vals = df[df['server'].isin([1,3])].groupby(['court'])['speed'].mean().to_frame()
    
    
    df['A13'],df['D13'] = vals.query("court=='A'")['speed'].item(), vals.query("court=='D'")['speed'].item()
    
        matchID  server court  speed    A13    D13
    0         1       1     A    100  110.0  205.0
    1         1       2     D    200  110.0  205.0
    2         1       3     D    300  110.0  205.0
    3         1       4     A    100  110.0  205.0
    4         1       1     A    120  110.0  205.0
    5         1       2     A    250  110.0  205.0
    6         1       3     D    110  110.0  205.0
    7         1       4     D    100  110.0  205.0
    8         2       1     A    100  110.0  205.0
    9         2       2     D    200  110.0  205.0
    10        2       3     D    300  110.0  205.0
    11        2       4     A    100  110.0  205.0
    12        2       1     A    120  110.0  205.0
    13        2       2     A    250  110.0  205.0
    14        2       3     D    110  110.0  205.0
    15        2       4     D    100  110.0  205.0
    
    0 讨论(0)
提交回复
热议问题