How can I get pandas' groupby command to return a DataFrame instead of a Series?

后端 未结 2 821
逝去的感伤
逝去的感伤 2021-01-19 15:56

I don\'t understand the output of pandas\' groupby. I started with a DataFrame (df0) with 5 fields/columns (zip, city, location, population, state).

<         


        
相关标签:
2条回答
  • 2021-01-19 16:19

    It's hard to say definitively without sample data, but with the code you show, returning a Series, you should be able to access the population for a city by using something like df6.loc['Albany', 'NY'] (that is, index your grouped Series by the city and state).

    The reason you get a Series is because you selected a single column ('pop') on which to apply your group computation. If you apply your group computation to a list of columns, you'll get a DataFrame. You could do this by doing df6 = df0.groupby(['city','state'])[['pop']].sum(). (Note the extra brackets around 'pop', to select a list of one column instead of a single column.) But I'm not sure there's a reason to do this if you can use the above method to access the city data anyway.

    0 讨论(0)
  • 2021-01-19 16:27

    Need parameter as_index=False in groupby or reset_index for convert MultiIndex to columns:

    df6 = df0.groupby(['city','state'], as_index=False)['pop'].sum()
    

    Or:

    df6 = df0.groupby(['city','state'])['pop'].sum().reset_index()
    

    Sample:

    df0 = pd.DataFrame({'city':['a','a','b'],
                       'state':['t','t','n'],
                       'pop':[7,8,9]})
    
    print (df0)
      city  pop state
    0    a    7     t
    1    a    8     t
    2    b    9     n
    
    df6 = df0.groupby(['city','state'], as_index=False)['pop'].sum()
    print (df6)
      city state  pop
    0    a     t   15
    1    b     n    9
    

    df6 = df0.groupby(['city','state'])['pop'].sum().reset_index()
    print (df6)
      city state  pop
    0    a     t   15
    1    b     n    9
    

    Last select by loc, for scalar add item():

    print (df6.loc[df6.state == 't', 'pop'])
    0    15
    Name: pop, dtype: int64
    
    print (df6.loc[df6.state == 't', 'pop'].item())
    15
    

    But if need only lookup table is possible use Series with MultiIndex:

    s = df0.groupby(['city','state'])['pop'].sum()
    print (s)
    city  state
    a     t        15
    b     n         9
    Name: pop, dtype: int64
    
    #select all cities by : and state by string like 't'
    #output is Series of len 1
    print (s.loc[:, 't'])
    city
    a    15
    Name: pop, dtype: int64
    
    #if need output as scalar add item()
    print (s.loc[:, 't'].item())
    15
    
    0 讨论(0)
提交回复
热议问题