How do you update the levels of a pandas MultiIndex after slicing its DataFrame?

后端未结

关注

 3  603

I have a Dataframe with a pandas MultiIndex:

In [1]: import pandas as pd
In [2]: multi_index = pd.MultiIndex.from_product([[\'CAN\',\'USA\'],[\'total\']],nam


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  不知归路        
                
              
                            
                2020-11-27 06:27
              
            
            
                                                                       
From version pandas 0.20.0+ use MultiIndex.remove_unused_levels:

print (df.index)
MultiIndex(levels=[['CAN', 'USA'], ['total']],
           labels=[[1], [0]],
           names=['country', 'sex'])

df.index = df.index.remove_unused_levels()

print (df.index)
MultiIndex(levels=[['USA'], ['total']],
           labels=[[0], [0]],
           names=['country', 'sex'])

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  遇见更好的自我        
                
              
                            
                2020-11-27 06:35
              
            
            
                                                                       
I will be surprised if there is a more "built-in" way to eliminate the unused country than to re-create the index in the way you're doing (or some similar way). If you look at your index before and after the slice:

In [165]: df.index
Out[165]:
MultiIndex(levels=[[u'CAN', u'USA'], [u'total']],
           labels=[[0, 1], [0, 0]],
           names=[u'country', u'sex'])

In [166]: df = df.query('pop > 100')

In [167]: df.index
Out[167]:
MultiIndex(levels=[[u'CAN', u'USA'], [u'total']],
           labels=[[1], [0]],
           names=[u'country', u'sex'])


you can see that the labels - which are indexes into the level values - have updated but not the level values. This may be an imperfect analogy, but it strikes me that the level values are analogous to an enumerated column in a database table, while the labels are analogous to the actual values of rows in the table. If you delete all the rows in a table with a value of "CAN", it doesn't change the fact that "CAN" is still a valid choice based on the column definition. To remove "CAN" from the enumeration, you have to alter the column definition; this is the equivalent of reindexing the dataframe in pandas.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一向        
                
              
                            
                2020-11-27 06:45
              
            
            
                                                                       
This is something that has bitten me before. Dropping columns or rows does NOT change the underlying MultiIndex, for performance and philosophical reasons, and this is officially not considered a bug (read more here). The short answer is that the developers say "that's not what the MultiIndex is for". If you need a list of the contents of a MultiIndex level after modification, for example for iteration or to check to see if something is included, you can use:

df.index.get_level_values(<levelname>)


This returns the current active values within that index level.

So I guess the "trick" here is that the API native way to do it is to use get_level_values instead of just .index or .columns
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复