Python pandas idxmax for multiple indexes in a dataframe

后端未结

关注

 3  914

I have a series that looks like this:

            delivery
2007-04-26  706           23
2007-04-27  705           10
            706         1089
            708


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  予麋鹿        
                
              
                            
                2021-02-10 01:03
              
            
            
                                                                       
Your example code doesn't work because the idxmax is executed after the groupby operation (so on the whole dataframe)

I'm not sure how to use idxmax on multilevel indexes, so here's a simple workaround.

Setting up data :

import pandas as pd
d= {'Date': ['2007-04-26', '2007-04-27', '2007-04-27', '2007-04-27',
             '2007-04-27', '2007-04-28', '2007-04-28'], 
        'DeliveryNb': [706, 705, 708, 450, 283, 45, 89],
        'DeliveryCount': [23, 10, 1089, 82, 34, 100, 11]}

df = pd.DataFrame.from_dict(d, orient='columns').set_index('Date')
print df


output

            DeliveryCount  DeliveryNb
Date                                 
2007-04-26             23         706
2007-04-27             10         705
2007-04-27           1089         708
2007-04-27             82         450
2007-04-27             34         283
2007-04-28            100          45
2007-04-28             11          89


creating custom function :

The trick is to use the reset_index() method (so you easily get the integer index of the group)

def func(df):
    idx = df.reset_index()['DeliveryCount'].idxmax()
    return df['DeliveryNb'].iloc[idx]


applying it :

g = df.groupby(df.index)
g.apply(func)


result :

Date
2007-04-26    706
2007-04-27    708
2007-04-28     45
dtype: int64

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦谈多话        
                
              
                            
                2021-02-10 01:20
              
            
            
                                                                       
If you have the following dataframe (you can always reset the index if needed with : df = df.reset_index()   : 

  Date  Del_Count  Del_Nb
0  1/1      14      19   <
1           11      17
2  2/2      25      29   <
3           21      27
4           22      28
5  3/3      34      36
6           37      37
7           31      39   <


To find the max per Date and extract the relevant Del_Count you can use:

df = df.ix[df.groupby(['Date'], sort=False)['Del_Nb'].idxmax()][['Date','Del_Count','Del_Nb']]


Which would yeild:

 Date  Del_Count  Del_Nb
0  1/1         14      19
2  2/2         25      29
7  3/3         31      39

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  旧时难觅i        
                
              
                            
                2021-02-10 01:24
              
            
            
                                                                       
Suppose you have this series:

            delivery
2001-01-02  0           2
            1           3
            6           2
            7           2
            9           3
2001-01-03  3           2
            6           1
            7           1
            8           3
            9           1
dtype: int64


If you want one delivery per date with the maximum value, you could use idxmax:

dates = series.index.get_level_values(0)
series.loc[series.groupby(dates).idxmax()]


yields

            delivery
2001-01-02  1           3
2001-01-03  8           3
dtype: int64


If you want all deliveries per date with the maximum value, use transform to generate a boolean mask:

mask = series.groupby(dates).transform(lambda x: x==x.max()).astype('bool')
series.loc[mask]


yields

            delivery
2001-01-02  1           3
            9           3
2001-01-03  8           3
dtype: int64




This is the code I used to generate series:

import pandas as pd
import numpy as np

np.random.seed(1)
N = 20
rng = pd.date_range('2001-01-02', periods=N//2, freq='4H')
rng = np.random.choice(rng, N, replace=True)
rng.sort()
df = pd.DataFrame(np.random.randint(10, size=(N,)), columns=['delivery'], index=rng)
series = df.groupby([df.index.date, 'delivery']).size()

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复