So I was trying to understand pandas.dataFrame.groupby() function and I came across this example on the documentation:
In [1]: df = pd.DataFrame({\'A\' :
When you use just
df.groupby('A')
You get a GroupBy object. You haven't applied any function to it at that point. Under the hood, while this definition might not be perfect, you can think of a groupby
object as:
To illustrate:
df = DataFrame({'A' : [1, 1, 2, 2], 'B' : [1, 2, 3, 4]})
grouped = df.groupby('A')
# each `i` is a tuple of (group, DataFrame)
# so your output here will be a little messy
for i in grouped:
print(i)
(1, A B
0 1 1
1 1 2)
(2, A B
2 2 3
3 2 4)
# this version uses multiple counters
# in a single loop. each `group` is a group, each
# `df` is its corresponding DataFrame
for group, df in grouped:
print('group of A:', group, '\n')
print(df, '\n')
group of A: 1
A B
0 1 1
1 1 2
group of A: 2
A B
2 2 3
3 2 4
# and if you just wanted to visualize the groups,
# your second counter is a "throwaway"
for group, _ in grouped:
print('group of A:', group, '\n')
group of A: 1
group of A: 2
Now as for .head
. Just have a look at the docs for that method:
Essentially equivalent to
.apply(lambda x: x.head(n))
So here you're actually applying a function to each group of the groupby object. Keep in mind .head(5)
is applied to each group (each DataFrame), so because you have less than or equal to 5 rows per group, you get your original DataFrame.
Consider this with the example above. If you use .head(1)
, you get only the first 1 row of each group:
print(df.groupby('A').head(1))
A B
0 1 1
2 2 3