Pandas - GroupBy and then Merge on original table

前端 未结 2 1463
半阙折子戏
半阙折子戏 2020-12-24 01:28

I\'m trying to write a function to aggregate and perform various stats calcuations on a dataframe in Pandas and then merge it to the original dataframe however, I\'m running

相关标签:
2条回答
  • 2020-12-24 02:25

    From the pandas docs:

    Transformation: perform some group-specific computations and return a like-indexed object

    Unfortunately, transform works series by series, so you wouldn't be able to perform multiple functions on multiple columns as you've done with agg, but transform does allow you to skip merge

    po_grouped_df = pol_df.groupby(['EID','PCODE'])
    pol_df['sum_pval'] = po_grouped_df['PVALUE'].transform(sum)
    pol_df['func_si'] = po_grouped_df['SI'].transform(lambda x: np.sqrt(np.sum(x * np.exp(x-1))))
    pol_df['sum_sc'] = po_grouped_df['SC'].transform(sum)
    pol_df['sum_ee'] = po_grouped_df['EE'].transform(sum)
    pol_df
    

    Results in:

    PID EID PCODE   PVALUE  SI  SC  EE      sum_pval    func_si         sum_sc  sum_ee
    1   123 GU      100     400 230 10000   250         8.765549e+87    443     12000
    1   123 GR      50      40  23  10000   350         1.805222e+31    236     40000
    2   123 GU      150     140 213 2000    250         8.765549e+87    443     12000
    2   123 GR      300     140 213 30000   350         1.805222e+31    236     40000
    

    For more info, check out this SO answer.

    0 讨论(0)
  • 2020-12-24 02:28

    By default, groupby output has the grouping columns as indicies, not columns, which is why the merge is failing.

    There are a couple different ways to handle it, probably the easiest is using the as_index parameter when you define the groupby object.

    po_grouped_df = poagg_df.groupby(['EID','PCODE'], as_index=False)
    

    Then, your merge should work as expected.

    In [356]: pd.merge(acc_df, pol_df, on=['EID','PCODE'], how='inner',suffixes=('_Acc','_Po'))
    Out[356]: 
       EID PCODE  SC_Acc  EE_Acc        SI_Acc  PVALUE_Acc  EE_Po  PVALUE_Po  \
    0  123    GR     236   40000  1.805222e+31         350  10000         50   
    1  123    GR     236   40000  1.805222e+31         350  30000        300   
    2  123    GU     443   12000  8.765549e+87         250  10000        100   
    3  123    GU     443   12000  8.765549e+87         250   2000        150   
    
       SC_Po  SI_Po  
    0     23     40  
    1    213    140  
    2    230    400  
    3    213    140  
    
    0 讨论(0)
提交回复
热议问题