I want to add an aggregate, grouped, nunique column to my pandas dataframe but not aggregate the entire dataframe. I\'m trying to do this in one line and avoid creating a ne
df.groupby(['track', 'type'])['id'].transform(nunique)
Implies that there is a name nunique
in the name space that performs some function. transform
will take a function or a string that it knows a function for. nunique
is definitely one of those strings.
As pointed out by @root, often the method that pandas
will utilize to perform a transformation indicated by these strings are optimized and should generally be preferred to passing your own functions. This is True
even for passing numpy
functions in some cases.
For example transform('sum')
should be preferred over transform(sum)
.
Try this instead
df.groupby(['track', 'type'])['id'].transform('nunique')
demo
df = pd.DataFrame(dict(
track=list('11112222'), type=list('AAAABBBB'), id=list('XXYZWWWW')))
print(df)
id track type
0 X 1 A
1 X 1 A
2 Y 1 A
3 Z 1 A
4 W 2 B
5 W 2 B
6 W 2 B
7 W 2 B
df.groupby(['track', 'type'])['id'].transform('nunique')
0 3
1 3
2 3
3 3
4 1
5 1
6 1
7 1
Name: id, dtype: int64