Get all keys from GroupBy object in Pandas

后端 未结 3 632
小鲜肉
小鲜肉 2020-12-28 12:55

I\'m looking for a way to get a list of all the keys in a GroupBy object, but I can\'t seem to find one via the docs nor through Google.

There is definitely a way t

相关标签:
3条回答
  • 2020-12-28 13:20

    Use the option sort=False to have group key order reserved gp = df.groupby('group', sort=False)

    0 讨论(0)
  • 2020-12-28 13:25

    A problem with EdChum's answer is that getting keys by launching gp.groups.keys() first constructs the full group dictionary. On large dataframes, this is a very slow operation, which effectively doubles the memory consumption. Iterating is waaay faster:

    df = pd.DataFrame({'group':list('bgaaabxeb'), 'val':np.arange(9)})
    gp = df.groupby('group')
    keys = [key for key, _ in gp]
    

    Executing this list comprehension took me 16 s on my groupby object, while I had to interrupt gp.groups.keys() after 3 minutes.

    0 讨论(0)
  • 2020-12-28 13:34

    You can access this via attribute .groups on the groupby object, this returns a dict, the keys of the dict gives you the groups:

    In [40]:
    df = pd.DataFrame({'group':[0,1,1,1,2,2,3,3,3], 'val':np.arange(9)})
    gp = df.groupby('group')
    gp.groups.keys()
    
    Out[40]:
    dict_keys([0, 1, 2, 3])
    

    here is the output from groups:

    In [41]:
    gp.groups
    
    Out[41]:
    {0: Int64Index([0], dtype='int64'),
     1: Int64Index([1, 2, 3], dtype='int64'),
     2: Int64Index([4, 5], dtype='int64'),
     3: Int64Index([6, 7, 8], dtype='int64')}
    

    Update

    it looks like that because the type of groups is a dict then the group order isn't maintained when you call keys:

    In [65]:
    df = pd.DataFrame({'group':list('bgaaabxeb'), 'val':np.arange(9)})
    gp = df.groupby('group')
    gp.groups.keys()
    
    Out[65]:
    dict_keys(['b', 'e', 'g', 'a', 'x'])
    

    if you call groups you can see the order is maintained:

    In [79]:
    gp.groups
    
    Out[79]:
    {'a': Int64Index([2, 3, 4], dtype='int64'),
     'b': Int64Index([0, 5, 8], dtype='int64'),
     'e': Int64Index([7], dtype='int64'),
     'g': Int64Index([1], dtype='int64'),
     'x': Int64Index([6], dtype='int64')}
    

    then the key order is maintained, a hack around this is to access the .name attribute of each group:

    In [78]:
    gp.apply(lambda x: x.name)
    
    Out[78]:
    group
    a    a
    b    b
    e    e
    g    g
    x    x
    dtype: object
    

    which isn't great as this isn't vectorised, however if you already have an aggregated object then you can just get the index values:

    In [81]:
    agg = gp.sum()
    agg
    
    Out[81]:
           val
    group     
    a        9
    b       13
    e        7
    g        1
    x        6
    
    In [83]:    
    agg.index.get_level_values(0)
    
    Out[83]:
    Index(['a', 'b', 'e', 'g', 'x'], dtype='object', name='group')
    
    0 讨论(0)
提交回复
热议问题