For dataframe
In [2]: df = pd.DataFrame({\'Name\': [\'foo\', \'bar\'] * 3,
...: \'Rank\': np.random.randint(0,3,6),
...:
A lot of handy things are stored in the DataFrameGroupBy.grouper
object. For example:
>>> df = pd.DataFrame({'Name': ['foo', 'bar'] * 3,
'Rank': np.random.randint(0,3,6),
'Val': np.random.rand(6)})
>>> grouped = df.groupby(["Name", "Rank"])
>>> grouped.grouper.
grouped.grouper.agg_series grouped.grouper.indices
grouped.grouper.aggregate grouped.grouper.labels
grouped.grouper.apply grouped.grouper.levels
grouped.grouper.axis grouped.grouper.names
grouped.grouper.compressed grouped.grouper.ngroups
grouped.grouper.get_group_levels grouped.grouper.nkeys
grouped.grouper.get_iterator grouped.grouper.result_index
grouped.grouper.group_info grouped.grouper.shape
grouped.grouper.group_keys grouped.grouper.size
grouped.grouper.groupings grouped.grouper.sort
grouped.grouper.groups
and so:
>>> df["GroupId"] = df.groupby(["Name", "Rank"]).grouper.group_info[0]
>>> df
Name Rank Val GroupId
0 foo 0 0.302482 2
1 bar 0 0.375193 0
2 foo 2 0.965763 4
3 bar 2 0.166417 1
4 foo 1 0.495124 3
5 bar 2 0.728776 1
There may be a nicer alias for for grouper.group_info[0]
lurking around somewhere, but this should work, anyway.
Use GroupBy.ngroup from pandas 0.20.2+:
df["GroupId"] = df.groupby(["Name", "Rank"]).ngroup()
print (df)
Name Rank Val GroupId
0 foo 2 0.451724 4
1 bar 0 0.944676 0
2 foo 0 0.822390 2
3 bar 2 0.063603 1
4 foo 1 0.938892 3
5 bar 2 0.332454 1
The correct solution is to use grouper.label_info
:
df["GroupId"] = df.groupby(["Name", "Rank"]).grouper.label_info
It automatically associates each row in the df
dataframe to the corresponding group label.