How to implode(reverse of pandas explode) based on a column

前端 未结 2 689
[愿得一人]
[愿得一人] 2021-01-19 09:14

I have a dataframe df like below

  NETWORK       config_id       APPLICABLE_DAYS  Case    Delivery  
0   Grocery     5399            SUN               10              


        
相关标签:
2条回答
  • 2021-01-19 09:43

    Your results look more like a sum, than average; The solution below uses named aggregation :

        df.groupby(["NETWORK", "config_id"]).agg(
        APPLICABLE_DAYS=("APPLICABLE_DAYS", ",".join),
        Total_Cases=("Case", "sum"),
        Total_Delivery=("Delivery", "sum"),
    )
    
                            APPLICABLE_DAYS       Total_Cases   Total_Delivery
    NETWORK config_id           
    Grocery 5399                SUN,MON,TUE,WED           100      10
    

    If it is the mean, then you can change the 'sum' to 'mean' :

    df.groupby(["NETWORK", "config_id"]).agg(
        APPLICABLE_DAYS=("APPLICABLE_DAYS", ",".join),
        Avg_Cases=("Case", "mean"),
        Avg_Delivery=("Delivery", "mean"),
    )
    
                        APPLICABLE_DAYS   Avg_Cases Avg_Delivery
    NETWORK config_id           
    Grocery 5399         SUN,MON,TUE,WED      25      2.5
    
    0 讨论(0)
  • 2021-01-19 09:56

    If you want the "opposite" of explode, then that means bringing it into a list in Solution #1. You can also join as a sting in Solution #2:

    Use lambda x: x.tolist() for the 'APPLICABLE_DAYS' column within your .agg groupby function:

    df = (df.groupby(['NETWORK','config_id'])
          .agg({'APPLICABLE_DAYS': lambda x: x.tolist(),'Case':'mean','Delivery':'mean'})
          .rename({'Case' : 'Avg_Cases','Delivery' : 'Avg_Delivery'},axis=1)
          .reset_index())
    df
    Out[1]: 
       NETWORK  config_id       APPLICABLE_DAYS  Avg_Cases  Avg_Delivery
    0  Grocery       5399  [SUN, MON, TUE, WED]         25           2.5
    

    Use lambda x: ",".join(x) for the 'APPLICABLE_DAYS' column within your .agg groupby function:

     df = (df.groupby(['NETWORK','config_id'])
          .agg({'APPLICABLE_DAYS': lambda x: ",".join(x),'Case':'mean','Delivery':'mean'})
          .rename({'Case' : 'Avg_Cases','Delivery' : 'Avg_Delivery'},axis=1)
          .reset_index())
    df
    Out[1]: 
       NETWORK  config_id       APPLICABLE_DAYS  Avg_Cases  Avg_Delivery
    0  Grocery       5399       SUN,MON,TUE,WED         25           2.5
    

    If you are looking for the sum, then you can just change mean to sum for the Cases and Delivery columns.

    0 讨论(0)
提交回复
热议问题