Advanced cross-section with multi-index in pandas

前端 未结 2 1231
梦如初夏
梦如初夏 2021-01-07 00:59

I have the following dataframe:

lb = [(\'A\',\'a\',1), (\'A\',\'a\',2), (\'A\',\'a\',3), (\'A\',\'b\',1), (\'A\',\'b\',2), (\'A\',\'b\',3), (\'B\',\'a\',1),          


        
相关标签:
2条回答
  • 2021-01-07 01:23

    This is a new feature in 0.14.0, see whatsnew here. This effectively replaces the need for .xs.

    In [8]: idx = pd.IndexSlice
    
    In [9]: df.loc[:,idx[:,:,[2,3]]]
    Out[9]: 
    first          A                                       B                              
    second         a                   b                   a                   b          
    third          2         3         2         3         2         3         2         3
    0       1.770120 -0.362269 -0.804352  1.549652  0.069858 -0.274113  0.570410 -0.460956
    1      -0.982169  2.044497  0.571353  0.310634 -1.865966 -0.862613  0.124413  0.645419
    2      -1.412519  0.168448  0.081467 -0.220464  1.033748  1.561429  0.094363  0.254768
    3      -0.653458 -0.978661  0.158708 -0.818675 -1.122577  0.026941  2.678548  0.864817
    4      -0.555179 -0.155564  1.148956  1.438523 -1.254660  0.609254 -0.970612  1.519028
    

    To subtract this is non-trivial.

    [107]: df = pd.DataFrame(np.arange(5*12).reshape(-1,12), columns=col)
    
    In [108]: df
    Out[108]: 
    first    A                       B                    
    second   a           b           a           b        
    third    1   2   3   1   2   3   1   2   3   1   2   3
    0        0   1   2   3   4   5   6   7   8   9  10  11
    1       12  13  14  15  16  17  18  19  20  21  22  23
    2       24  25  26  27  28  29  30  31  32  33  34  35
    3       36  37  38  39  40  41  42  43  44  45  46  47
    4       48  49  50  51  52  53  54  55  56  57  58  59
    

    Pandas wants to align the rhs side (after all you are subtracing DIFFERENT indexes), so you need to manually broadcast this. Here is an issue about this: https://github.com/pydata/pandas/issues/7475

    In [109]: df.loc[:,idx[:,:[2,3]]] = df.loc[:,idx[:,:,[2,3]]]-np.tile(df.loc[:,idx[:,:,1]].values,2)
    Out[109]: 
    first   A           B         
    second  a     b     a     b   
    third   2  3  2  3  2  3  2  3
    0       1 -1 -2 -4  7  5  4  2
    1       1 -1 -2 -4  7  5  4  2
    2       1 -1 -2 -4  7  5  4  2
    3       1 -1 -2 -4  7  5  4  2
    4       1 -1 -2 -4  7  5  4  2
    
    0 讨论(0)
  • 2021-01-07 01:25

    Seems you must not use the xs-function with more than a single key. It might be that there exists a fancier slicing, but I would keep it as simple as possible and produce a partial multiindex object that fits my needs:

    cols = df.columns
    thirdlvl = cols.get_level_values('third')
    
    partialcols = [col for col, third in zip(cols, thirdlvl) if third in [2,3]]
    

    With these columns, you get the partial data frame you want:

    print df[partialcolumns]
    
    first          A                                       B                              
    second         a                   b                   a                   b          
    third          2         3         2         3         2         3         2         3
    0       1.103063  1.036151 -0.018996  1.436792 -0.956119  1.587688  2.262837 -1.059619
    1       0.950664  1.847895 -1.172043  0.752676 -0.091956 -0.431509 -0.653317 -0.545843
    2       0.165655 -0.180710 -1.844222 -0.836338  1.687806 -0.469707 -0.374222  0.132809
    3      -0.275194  0.141292  1.021046 -0.010747  1.725614  0.530589  0.106327  0.138661
    4       0.371840  0.455063 -2.643567  0.406322 -0.717277  0.667969  0.660701 -1.324643
    

    EDIT: The simple piece of code below will also find the right columns, of course

     partialcols = [col for col in cols if col[2] in [2,3]]
    
    0 讨论(0)
提交回复
热议问题