How is pandas groupby method actually working?

前端 未结 1 391
遇见更好的自我
遇见更好的自我 2020-11-29 12:32

So I was trying to understand pandas.dataFrame.groupby() function and I came across this example on the documentation:

    In [1]: df = pd.DataFrame({\'A\' :         


        
相关标签:
1条回答
  • 2020-11-29 13:01

    When you use just

    df.groupby('A')
    

    You get a GroupBy object. You haven't applied any function to it at that point. Under the hood, while this definition might not be perfect, you can think of a groupby object as:

    • An iterator of (group, DataFrame) pairs, for DataFrames, or
    • An iterator of (group, Series) pairs, for Series.

    To illustrate:

    df = DataFrame({'A' : [1, 1, 2, 2], 'B' : [1, 2, 3, 4]})
    grouped = df.groupby('A')
    
    # each `i` is a tuple of (group, DataFrame)
    # so your output here will be a little messy
    for i in grouped:
        print(i)
    (1,    A  B
    0  1  1
    1  1  2)
    (2,    A  B
    2  2  3
    3  2  4)
    
    # this version uses multiple counters
    # in a single loop.  each `group` is a group, each
    # `df` is its corresponding DataFrame
    for group, df in grouped:
        print('group of A:', group, '\n')
        print(df, '\n')
    group of A: 1 
    
       A  B
    0  1  1
    1  1  2 
    
    group of A: 2 
    
       A  B
    2  2  3
    3  2  4 
    
    # and if you just wanted to visualize the groups,
    # your second counter is a "throwaway"
    for group, _ in grouped:
        print('group of A:', group, '\n')
    group of A: 1 
    
    group of A: 2 
    

    Now as for .head. Just have a look at the docs for that method:

    Essentially equivalent to .apply(lambda x: x.head(n))

    So here you're actually applying a function to each group of the groupby object. Keep in mind .head(5) is applied to each group (each DataFrame), so because you have less than or equal to 5 rows per group, you get your original DataFrame.

    Consider this with the example above. If you use .head(1), you get only the first 1 row of each group:

    print(df.groupby('A').head(1))
       A  B
    0  1  1
    2  2  3
    
    0 讨论(0)
提交回复
热议问题