Advanced cross-section with multi-index in pandas

前端未结

关注

 2  1229

梦如初夏 2021-01-07 00:59

I have the following dataframe:

lb = [(\'A\',\'a\',1), (\'A\',\'a\',2), (\'A\',\'a\',3), (\'A\',\'b\',1), (\'A\',\'b\',2), (\'A\',\'b\',3), (\'B\',\'a\',1),


      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   迷失自我
                                             
                
                
                (楼主)
            
              
              
                2021-01-07 01:23
              

            
            
                        
This is a new feature in 0.14.0, see whatsnew here. This effectively replaces the need for .xs.

In [8]: idx = pd.IndexSlice

In [9]: df.loc[:,idx[:,:,[2,3]]]
Out[9]: 
first          A                                       B                              
second         a                   b                   a                   b          
third          2         3         2         3         2         3         2         3
0       1.770120 -0.362269 -0.804352  1.549652  0.069858 -0.274113  0.570410 -0.460956
1      -0.982169  2.044497  0.571353  0.310634 -1.865966 -0.862613  0.124413  0.645419
2      -1.412519  0.168448  0.081467 -0.220464  1.033748  1.561429  0.094363  0.254768
3      -0.653458 -0.978661  0.158708 -0.818675 -1.122577  0.026941  2.678548  0.864817
4      -0.555179 -0.155564  1.148956  1.438523 -1.254660  0.609254 -0.970612  1.519028


To subtract this is non-trivial.

[107]: df = pd.DataFrame(np.arange(5*12).reshape(-1,12), columns=col)

In [108]: df
Out[108]: 
first    A                       B                    
second   a           b           a           b        
third    1   2   3   1   2   3   1   2   3   1   2   3
0        0   1   2   3   4   5   6   7   8   9  10  11
1       12  13  14  15  16  17  18  19  20  21  22  23
2       24  25  26  27  28  29  30  31  32  33  34  35
3       36  37  38  39  40  41  42  43  44  45  46  47
4       48  49  50  51  52  53  54  55  56  57  58  59


Pandas wants to align the rhs side (after all you are subtracing DIFFERENT indexes),
so you need to manually broadcast this. Here is an issue about this: https://github.com/pydata/pandas/issues/7475

In [109]: df.loc[:,idx[:,:[2,3]]] = df.loc[:,idx[:,:,[2,3]]]-np.tile(df.loc[:,idx[:,:,1]].values,2)
Out[109]: 
first   A           B         
second  a     b     a     b   
third   2  3  2  3  2  3  2  3
0       1 -1 -2 -4  7  5  4  2
1       1 -1 -2 -4  7  5  4  2
2       1 -1 -2 -4  7  5  4  2
3       1 -1 -2 -4  7  5  4  2
4       1 -1 -2 -4  7  5  4  2

    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复