I\'m trying to collapse rows in a dataframe that contains a column of ID data and a number of columns that each hold a different string. It looks like groupby is the solution, b
You can use groupby with aggregation ''.join
, sum
or max
:
#if blank values are NaN first replace to ''
df = df.fillna('')
df = df.groupby('ID').agg(''.join)
print (df)
apples pears oranges
ID
101 oranges
134 apples pears
576 pears oranges
837 apples
Also works:
df = df.fillna('')
df = df.groupby('ID').sum()
#alternatively max
#df = df.groupby('ID').max()
print (df)
apples pears oranges
ID
101 oranges
134 apples pears
576 pears oranges
837 apples
Also if need remove duplicates per group and per column add unique:
df = df.groupby('ID').agg(lambda x: ''.join(x.unique()))