I want to aggregate a dataframe - to get the first row of every group and simultaneously to concatenate the values in column \'upc\':
df = pd.DataFrame({
I think you need as_index=False
to first
and add reset_index()
to concat_upcs_df
for DataFrame
s:
firsts_df = df.groupby(['id1', 'id2'], as_index=False).first()
concat_upcs_df = df[['id1', 'id2', 'upc']].groupby(['id1', 'id2']).apply(lambda x: '|'.join(x.upc)).reset_index(name='val')
firsts_df.merge(concat_upcs_df, how='inner',left_on=['id1', 'id2'], right_on=['id1', 'id2'])
print (df)
id1 id2 upc value1 val
0 1 11 100 1first 100|102
1 1 22 101 1second 101
2 2 11 103 2first 103
3 2 22 104 2second 104
4 3 33 105 3first 105|106|107|108
5 4 44 109 4first 109|110
6 5 55 111 5first 111
7 6 22 114 6third 114
8 6 66 112 6first 112|113
9 7 77 115 7first 115|116
You can also use drop_duplicates instead first
and apply
without lambda
, also merge working with on
, because left and right joined columns are same:
firsts_df = df.drop_duplicates(['id1', 'id2'])
concat_upcs_df = df.groupby(['id1', 'id2'])['upc'].apply('|'.join).reset_index(name='val')
df = firsts_df.merge(concat_upcs_df, on=['id1', 'id2'])