How can I get pandas' groupby command to return a DataFrame instead of a Series?

后端未结

关注

 2  824

I don\'t understand the output of pandas\' groupby. I started with a DataFrame (df0) with 5 fields/columns (zip, city, location, population, state).

相关标签:

2条回答

面向向阳花

2021-01-19 16:19

It's hard to say definitively without sample data, but with the code you show, returning a Series, you should be able to access the population for a city by using something like df6.loc['Albany', 'NY'] (that is, index your grouped Series by the city and state).

The reason you get a Series is because you selected a single column ('pop') on which to apply your group computation. If you apply your group computation to a list of columns, you'll get a DataFrame. You could do this by doing df6 = df0.groupby(['city','state'])[['pop']].sum(). (Note the extra brackets around 'pop', to select a list of one column instead of a single column.) But I'm not sure there's a reason to do this if you can use the above method to access the city data anyway.

0 讨论(0)
发布评论:

提交评论
- 加载中...

面向向阳花

2021-01-19 16:27

Need parameter as_index=False in groupby or reset_index for convert MultiIndex to columns:

df6 = df0.groupby(['city','state'], as_index=False)['pop'].sum()

Or:

df6 = df0.groupby(['city','state'])['pop'].sum().reset_index()

Sample:

df0 = pd.DataFrame({'city':['a','a','b'],
                   'state':['t','t','n'],
                   'pop':[7,8,9]})

print (df0)
  city  pop state
0    a    7     t
1    a    8     t
2    b    9     n

df6 = df0.groupby(['city','state'], as_index=False)['pop'].sum()
print (df6)
  city state  pop
0    a     t   15
1    b     n    9

df6 = df0.groupby(['city','state'])['pop'].sum().reset_index()
print (df6)
  city state  pop
0    a     t   15
1    b     n    9

Last select by loc, for scalar add item():

print (df6.loc[df6.state == 't', 'pop'])
0    15
Name: pop, dtype: int64

print (df6.loc[df6.state == 't', 'pop'].item())
15

But if need only lookup table is possible use Series with MultiIndex:

s = df0.groupby(['city','state'])['pop'].sum()
print (s)
city  state
a     t        15
b     n         9
Name: pop, dtype: int64

#select all cities by : and state by string like 't'
#output is Series of len 1
print (s.loc[:, 't'])
city
a    15
Name: pop, dtype: int64

#if need output as scalar add item()
print (s.loc[:, 't'].item())
15

0 讨论(0)