Dataframe:
one two
a 1 x
b 1 y
c 2 y
d 2 z
e 3 z
grp = DataFrame.groupby(\'one\')
grp.agg(lambda x: ???) #or equivalent function
Desired o
There is a better way to concatenate strings, in pandas documentation.
So I prefer this way:
In [1]: df.groupby('one').agg(lambda x: x.str.cat(sep='|'))
Out[1]:
two
one
1 x|y
2 y|z
3 z
You were so close:
In [1]: df.groupby('one').agg(lambda x: "|".join(x.tolist()))
Out[1]:
two
one
1 x|y
2 y|z
3 z
Expanded answer to handle sorting and take only the set:
In [1]: df = DataFrame({'one':[1,1,2,2,3], 'two':list('xyyzz'), 'three':list('eecba')}, index=list('abcde'), columns=['one','two','three'])
In [2]: df
Out[2]:
one two three
a 1 x e
b 1 y e
c 2 y c
d 2 z b
e 3 z a
In [3]: df.groupby('one').agg(lambda x: "|".join(x.order().unique().tolist()))
Out[3]:
two three
one
1 x|y e
2 y|z b|c
3 z a
Just an elaboration on the accepted answer:
df.groupby('one').agg(lambda x: "|".join(x.tolist()))
Note that the type of df.groupby('one')
is SeriesGroupBy
. And the function agg
defined on this type. If you check the documentation of this function, it says its input is a function that works on Series. This means that x
type in the above lambda is Series.
Another note is that defining the agg function as lambda is not necessary. If the aggregation function is complex, it can be defined separately as a regular function like below. The only constraint is that the x type should be of Series (or compatible with it):
def myfun1(x):
return "|".join(x.tolist())
and then:
df.groupby('one').agg(myfun1)