Merge dataframe with aggregation

前端 未结 1 1180
独厮守ぢ
独厮守ぢ 2021-01-15 22:52

I want to aggregate a dataframe - to get the first row of every group and simultaneously to concatenate the values in column \'upc\':

df = pd.DataFrame({
            


        
相关标签:
1条回答
  • 2021-01-15 23:44

    I think you need as_index=False to first and add reset_index() to concat_upcs_df for DataFrames:

    firsts_df = df.groupby(['id1', 'id2'], as_index=False).first()
    concat_upcs_df = df[['id1', 'id2', 'upc']].groupby(['id1', 'id2']).apply(lambda x: '|'.join(x.upc)).reset_index(name='val')
    firsts_df.merge(concat_upcs_df, how='inner',left_on=['id1', 'id2'], right_on=['id1', 'id2'])
    print (df)
       id1  id2  upc   value1              val
    0    1   11  100   1first          100|102
    1    1   22  101  1second              101
    2    2   11  103   2first              103
    3    2   22  104  2second              104
    4    3   33  105   3first  105|106|107|108
    5    4   44  109   4first          109|110
    6    5   55  111   5first              111
    7    6   22  114   6third              114
    8    6   66  112   6first          112|113
    9    7   77  115   7first          115|116
    

    You can also use drop_duplicates instead first and apply without lambda, also merge working with on, because left and right joined columns are same:

    firsts_df = df.drop_duplicates(['id1', 'id2'])
    concat_upcs_df = df.groupby(['id1', 'id2'])['upc'].apply('|'.join).reset_index(name='val')
    df = firsts_df.merge(concat_upcs_df, on=['id1', 'id2'])
    
    0 讨论(0)
提交回复
热议问题