Advanced cross-section with multi-index in pandas

前端未结

关注

 2  1231

I have the following dataframe:

lb = [(\'A\',\'a\',1), (\'A\',\'a\',2), (\'A\',\'a\',3), (\'A\',\'b\',1), (\'A\',\'b\',2), (\'A\',\'b\',3), (\'B\',\'a\',1),


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  迷失自我        
                
              
                            
                2021-01-07 01:23
              
            
            
                                                                       
This is a new feature in 0.14.0, see whatsnew here. This effectively replaces the need for .xs.

In [8]: idx = pd.IndexSlice

In [9]: df.loc[:,idx[:,:,[2,3]]]
Out[9]: 
first          A                                       B                              
second         a                   b                   a                   b          
third          2         3         2         3         2         3         2         3
0       1.770120 -0.362269 -0.804352  1.549652  0.069858 -0.274113  0.570410 -0.460956
1      -0.982169  2.044497  0.571353  0.310634 -1.865966 -0.862613  0.124413  0.645419
2      -1.412519  0.168448  0.081467 -0.220464  1.033748  1.561429  0.094363  0.254768
3      -0.653458 -0.978661  0.158708 -0.818675 -1.122577  0.026941  2.678548  0.864817
4      -0.555179 -0.155564  1.148956  1.438523 -1.254660  0.609254 -0.970612  1.519028


To subtract this is non-trivial.

[107]: df = pd.DataFrame(np.arange(5*12).reshape(-1,12), columns=col)

In [108]: df
Out[108]: 
first    A                       B                    
second   a           b           a           b        
third    1   2   3   1   2   3   1   2   3   1   2   3
0        0   1   2   3   4   5   6   7   8   9  10  11
1       12  13  14  15  16  17  18  19  20  21  22  23
2       24  25  26  27  28  29  30  31  32  33  34  35
3       36  37  38  39  40  41  42  43  44  45  46  47
4       48  49  50  51  52  53  54  55  56  57  58  59


Pandas wants to align the rhs side (after all you are subtracing DIFFERENT indexes),
so you need to manually broadcast this. Here is an issue about this: https://github.com/pydata/pandas/issues/7475

In [109]: df.loc[:,idx[:,:[2,3]]] = df.loc[:,idx[:,:,[2,3]]]-np.tile(df.loc[:,idx[:,:,1]].values,2)
Out[109]: 
first   A           B         
second  a     b     a     b   
third   2  3  2  3  2  3  2  3
0       1 -1 -2 -4  7  5  4  2
1       1 -1 -2 -4  7  5  4  2
2       1 -1 -2 -4  7  5  4  2
3       1 -1 -2 -4  7  5  4  2
4       1 -1 -2 -4  7  5  4  2

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦谈多话        
                
              
                            
                2021-01-07 01:25
              
            
            
                                                                       
Seems you must not use the xs-function with more than a single key. It might be that there exists a fancier slicing, but I would keep it as simple as possible and produce a partial multiindex object that fits my needs:

cols = df.columns
thirdlvl = cols.get_level_values('third')

partialcols = [col for col, third in zip(cols, thirdlvl) if third in [2,3]]


With these columns, you get the partial data frame you want:

print df[partialcolumns]

first          A                                       B                              
second         a                   b                   a                   b          
third          2         3         2         3         2         3         2         3
0       1.103063  1.036151 -0.018996  1.436792 -0.956119  1.587688  2.262837 -1.059619
1       0.950664  1.847895 -1.172043  0.752676 -0.091956 -0.431509 -0.653317 -0.545843
2       0.165655 -0.180710 -1.844222 -0.836338  1.687806 -0.469707 -0.374222  0.132809
3      -0.275194  0.141292  1.021046 -0.010747  1.725614  0.530589  0.106327  0.138661
4       0.371840  0.455063 -2.643567  0.406322 -0.717277  0.667969  0.660701 -1.324643


EDIT: The simple piece of code below will also find the right columns, of course

 partialcols = [col for col in cols if col[2] in [2,3]]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复