Advanced cross-section with multi-index in pandas

前端 未结 2 1229
梦如初夏
梦如初夏 2021-01-07 00:59

I have the following dataframe:

lb = [(\'A\',\'a\',1), (\'A\',\'a\',2), (\'A\',\'a\',3), (\'A\',\'b\',1), (\'A\',\'b\',2), (\'A\',\'b\',3), (\'B\',\'a\',1),          


        
2条回答
  •  迷失自我
    2021-01-07 01:23

    This is a new feature in 0.14.0, see whatsnew here. This effectively replaces the need for .xs.

    In [8]: idx = pd.IndexSlice
    
    In [9]: df.loc[:,idx[:,:,[2,3]]]
    Out[9]: 
    first          A                                       B                              
    second         a                   b                   a                   b          
    third          2         3         2         3         2         3         2         3
    0       1.770120 -0.362269 -0.804352  1.549652  0.069858 -0.274113  0.570410 -0.460956
    1      -0.982169  2.044497  0.571353  0.310634 -1.865966 -0.862613  0.124413  0.645419
    2      -1.412519  0.168448  0.081467 -0.220464  1.033748  1.561429  0.094363  0.254768
    3      -0.653458 -0.978661  0.158708 -0.818675 -1.122577  0.026941  2.678548  0.864817
    4      -0.555179 -0.155564  1.148956  1.438523 -1.254660  0.609254 -0.970612  1.519028
    

    To subtract this is non-trivial.

    [107]: df = pd.DataFrame(np.arange(5*12).reshape(-1,12), columns=col)
    
    In [108]: df
    Out[108]: 
    first    A                       B                    
    second   a           b           a           b        
    third    1   2   3   1   2   3   1   2   3   1   2   3
    0        0   1   2   3   4   5   6   7   8   9  10  11
    1       12  13  14  15  16  17  18  19  20  21  22  23
    2       24  25  26  27  28  29  30  31  32  33  34  35
    3       36  37  38  39  40  41  42  43  44  45  46  47
    4       48  49  50  51  52  53  54  55  56  57  58  59
    

    Pandas wants to align the rhs side (after all you are subtracing DIFFERENT indexes), so you need to manually broadcast this. Here is an issue about this: https://github.com/pydata/pandas/issues/7475

    In [109]: df.loc[:,idx[:,:[2,3]]] = df.loc[:,idx[:,:,[2,3]]]-np.tile(df.loc[:,idx[:,:,1]].values,2)
    Out[109]: 
    first   A           B         
    second  a     b     a     b   
    third   2  3  2  3  2  3  2  3
    0       1 -1 -2 -4  7  5  4  2
    1       1 -1 -2 -4  7  5  4  2
    2       1 -1 -2 -4  7  5  4  2
    3       1 -1 -2 -4  7  5  4  2
    4       1 -1 -2 -4  7  5  4  2
    

提交回复
热议问题