idxmax() doesn't work on SeriesGroupBy that contains NaN

前端 未结 3 1895
借酒劲吻你
借酒劲吻你 2021-01-27 04:41

Here is my code

from pandas import DataFrame, Series
import pandas as pd
import numpy as np
income = DataFrame({\'name\': [\'Adam\', \'Bill\', \'Chris\', \'Dave\         


        
相关标签:
3条回答
  • 2021-01-27 05:21

    Since groupby preserves order of rows within each group, you sort income before groupby. Then, pick up the firsts using head:

    grouped=income.sort('income', ascending=False).groupby([ageBin])
    highestIncome = income.ix[grouped.head(1).index]
    #highestIncome is no longer ordered by age. 
    #If you want to recover this, sort it again.
    highestIncome.sort('age', inplace=True)
    

    By the way, beware that the reference manual does not mention that groupby will preserve the order. I think most clean solution would be fix pandas's idxmax to work. For me, it is a little bit strange why idxmax does not work while max works.

    0 讨论(0)
  • 2021-01-27 05:27

    Just apply a lambda function on the groups like so:

    grouped.apply(lambda x: x.max())
    
    0 讨论(0)
  • 2021-01-27 05:35
    grouped['income'].agg(lambda x : x.idxmax())
    
    
    Out[]:
    age
    (20, 30]     1
    (30, 40]   NaN
    (40, 50]     2
    (50, 60]     4
    Name: income, dtype: float64
    

    and then you can do the following to get the data

    income.ix[result.values].dropna()
    
    0 讨论(0)
提交回复
热议问题