Get group id back into pandas dataframe

后端 未结 3 1110
别那么骄傲
别那么骄傲 2020-12-04 18:26

For dataframe

In [2]: df = pd.DataFrame({\'Name\': [\'foo\', \'bar\'] * 3,
   ...:                    \'Rank\': np.random.randint(0,3,6),
   ...:                     


        
相关标签:
3条回答
  • 2020-12-04 18:49

    A lot of handy things are stored in the DataFrameGroupBy.grouper object. For example:

    >>> df = pd.DataFrame({'Name': ['foo', 'bar'] * 3,
                       'Rank': np.random.randint(0,3,6),
                       'Val': np.random.rand(6)})
    >>> grouped = df.groupby(["Name", "Rank"])
    >>> grouped.grouper.
    grouped.grouper.agg_series        grouped.grouper.indices
    grouped.grouper.aggregate         grouped.grouper.labels
    grouped.grouper.apply             grouped.grouper.levels
    grouped.grouper.axis              grouped.grouper.names
    grouped.grouper.compressed        grouped.grouper.ngroups
    grouped.grouper.get_group_levels  grouped.grouper.nkeys
    grouped.grouper.get_iterator      grouped.grouper.result_index
    grouped.grouper.group_info        grouped.grouper.shape
    grouped.grouper.group_keys        grouped.grouper.size
    grouped.grouper.groupings         grouped.grouper.sort
    grouped.grouper.groups            
    

    and so:

    >>> df["GroupId"] = df.groupby(["Name", "Rank"]).grouper.group_info[0]
    >>> df
      Name  Rank       Val  GroupId
    0  foo     0  0.302482        2
    1  bar     0  0.375193        0
    2  foo     2  0.965763        4
    3  bar     2  0.166417        1
    4  foo     1  0.495124        3
    5  bar     2  0.728776        1
    

    There may be a nicer alias for for grouper.group_info[0] lurking around somewhere, but this should work, anyway.

    0 讨论(0)
  • 2020-12-04 18:58

    Use GroupBy.ngroup from pandas 0.20.2+:

    df["GroupId"] = df.groupby(["Name", "Rank"]).ngroup()
    print (df)
      Name  Rank       Val  GroupId
    0  foo     2  0.451724        4
    1  bar     0  0.944676        0
    2  foo     0  0.822390        2
    3  bar     2  0.063603        1
    4  foo     1  0.938892        3
    5  bar     2  0.332454        1
    
    0 讨论(0)
  • 2020-12-04 19:06

    The correct solution is to use grouper.label_info:

    df["GroupId"] = df.groupby(["Name", "Rank"]).grouper.label_info
    

    It automatically associates each row in the df dataframe to the corresponding group label.

    0 讨论(0)
提交回复
热议问题