Collapsing rows in a Pandas dataframe

前端未结

关注

 2  820

梦谈多话 2021-02-09 20:27

I\'m trying to collapse rows in a dataframe that contains a column of ID data and a number of columns that each hold a different string. It looks like groupby is the solution, b

2条回答

無奈伤痛 (楼主)

2021-02-09 20:37

You can use groupby with aggregation ''.join, sum or max:

#if blank values are NaN first replace to ''
df = df.fillna('')

df = df.groupby('ID').agg(''.join)
print (df)
     apples  pears  oranges
ID                         
101                 oranges
134  apples  pears         
576          pears  oranges
837  apples

Also works:

df = df.fillna('')
df = df.groupby('ID').sum()
#alternatively max
#df = df.groupby('ID').max()
print (df)
     apples  pears  oranges
ID                         
101                 oranges
134  apples  pears         
576          pears  oranges
837  apples

Also if need remove duplicates per group and per column add unique:

df = df.groupby('ID').agg(lambda x: ''.join(x.unique()))

0 讨论(0)

查看其它2个回答