pandas groupby-apply behavior, returning a Series (inconsistent output type)

后端未结

关注

 2  588

隐瞒了意图╮

I\'m curious about the behavior of pandas groupby-apply when the apply function returns a series.

When the series are of different lengths, it returns a multi-indexed se

相关标签:

2条回答

独厮守ぢ

2021-02-05 10:27
In essence, a dataframe consists of equal-length series (technically a dictionary container of Series objects). As stated in the pandas split-apply-combine docs, running a groupby() refers to one or more of the following
- Splitting the data into groups based on some criteria
- Applying a function to each group independently
- Combining the results into a data structure
Notice this does not state a data frame is always produced, but a generalized data structure. So a groupby() operation can downcast to a Series, or if given a Series as input, can upcast to dataframe.

For your first dataframe, you run unequal groupings (or unequal index lengths) coercing a series return which in the "combine" processing does not adequately yield a data frame. Since a data frame cannot combine different length series it instead yields a multi-index series. You can see this with print statements in the defined function with the state==A group having length 2 and B group length 3.
```
def f(x):
    print(x)
    return pd.Series(x['city'].values, index=range(len(x)))

s1 = df1.groupby('state').apply(f)

print(s1)
#   city state
# 0    v     A
# 1    w     A
#   city state
# 0    v     A
# 1    w     A
#   city state
# 2    x     B
# 3    y     B
# 4    z     B
# state   
# A      0    v
#        1    w
# B      0    x
#        1    y
#        2    z
# dtype: object
```
However, you can manipulate the multi-index series outcome by resetting index and thereby adjusting its hierarchical levels:
```
df = df1.groupby('state').apply(f).reset_index()
print(df)

#   state  level_1  0
# 0     A        0  v
# 1     A        1  w
# 2     B        0  x
# 3     B        1  y
# 4     B        2  z
```
But more relevant to your needs is unstack() which pivots a level of the index labels, yielding a data frame. Consider fillna() to fill the None outcome.
```
df = df1.groupby('state').apply(f).unstack()
print(df)

#        0  1     2
# state            
# A      v  w  None
# B      x  y     z
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
旧巷少年郎

2021-02-05 10:27

instead of doing index=range(len(x)) in your function f, you can do index=x.index to prevent this undesired behavior

0 讨论(0)
发布评论:

提交评论
- 加载中...