Pandas: Creating aggregated column in DataFrame

前端 未结 4 636
栀梦
栀梦 2020-12-05 04:54

With the DataFrame below as an example,

In [83]:
df = pd.DataFrame({\'A\':[1,1,2,2],\'B\':[1,2,1,2],\'values\':np.arange(10,30,5)})
df
Out[83]:
   A  B  val         


        
相关标签:
4条回答
  • 2020-12-05 05:11
    In [20]: df = pd.DataFrame({'A':[1,1,2,2],'B':[1,2,1,2],'values':np.arange(10,30,5)})
    
    In [21]: df
    Out[21]:
       A  B  values
    0  1  1      10
    1  1  2      15
    2  2  1      20
    3  2  2      25
    
    In [22]: df['sum_values_A'] = df.groupby('A')['values'].transform(np.sum)
    
    In [23]: df
    Out[23]:
       A  B  values  sum_values_A
    0  1  1      10            25
    1  1  2      15            25
    2  2  1      20            45
    3  2  2      25            45
    
    0 讨论(0)
  • 2020-12-05 05:11

    This is not so direct but I found it very intuitive (the use of map to create new columns from another column) and can be applied to many other cases:

    gb = df.groupby('A').sum()['values']
    
    def getvalue(x):
        return gb[x]
    
    df['sum'] = df['A'].map(getvalue)
    df
    
    0 讨论(0)
  • 2020-12-05 05:26

    I found a way using join:

    In [101]:
    aggregated = df.groupby('A').sum()['values']
    aggregated.name = 'sum_values_A'
    df.join(aggregated,on='A')
    
    Out[101]:
       A  B  values  sum_values_A
    0  1  1      10            25
    1  1  2      15            25
    2  2  1      20            45
    3  2  2      25            45
    

    Anyone has a simpler way to do it?

    0 讨论(0)
  • 2020-12-05 05:26
    In [15]: def sum_col(df, col, new_col):
       ....:     df[new_col] = df[col].sum()
       ....:     return df
    
    In [16]: df.groupby("A").apply(sum_col, 'values', 'sum_values_A')
    Out[16]: 
       A  B  values  sum_values_A
    0  1  1      10            25
    1  1  2      15            25
    2  2  1      20            45
    3  2  2      25            45
    
    0 讨论(0)
提交回复
热议问题