I have the following dataframe:
lb = [(\'A\',\'a\',1), (\'A\',\'a\',2), (\'A\',\'a\',3), (\'A\',\'b\',1), (\'A\',\'b\',2), (\'A\',\'b\',3), (\'B\',\'a\',1),
This is a new feature in 0.14.0, see whatsnew here. This effectively replaces the need for .xs
.
In [8]: idx = pd.IndexSlice
In [9]: df.loc[:,idx[:,:,[2,3]]]
Out[9]:
first A B
second a b a b
third 2 3 2 3 2 3 2 3
0 1.770120 -0.362269 -0.804352 1.549652 0.069858 -0.274113 0.570410 -0.460956
1 -0.982169 2.044497 0.571353 0.310634 -1.865966 -0.862613 0.124413 0.645419
2 -1.412519 0.168448 0.081467 -0.220464 1.033748 1.561429 0.094363 0.254768
3 -0.653458 -0.978661 0.158708 -0.818675 -1.122577 0.026941 2.678548 0.864817
4 -0.555179 -0.155564 1.148956 1.438523 -1.254660 0.609254 -0.970612 1.519028
To subtract this is non-trivial.
[107]: df = pd.DataFrame(np.arange(5*12).reshape(-1,12), columns=col)
In [108]: df
Out[108]:
first A B
second a b a b
third 1 2 3 1 2 3 1 2 3 1 2 3
0 0 1 2 3 4 5 6 7 8 9 10 11
1 12 13 14 15 16 17 18 19 20 21 22 23
2 24 25 26 27 28 29 30 31 32 33 34 35
3 36 37 38 39 40 41 42 43 44 45 46 47
4 48 49 50 51 52 53 54 55 56 57 58 59
Pandas wants to align the rhs side (after all you are subtracing DIFFERENT indexes), so you need to manually broadcast this. Here is an issue about this: https://github.com/pydata/pandas/issues/7475
In [109]: df.loc[:,idx[:,:[2,3]]] = df.loc[:,idx[:,:,[2,3]]]-np.tile(df.loc[:,idx[:,:,1]].values,2)
Out[109]:
first A B
second a b a b
third 2 3 2 3 2 3 2 3
0 1 -1 -2 -4 7 5 4 2
1 1 -1 -2 -4 7 5 4 2
2 1 -1 -2 -4 7 5 4 2
3 1 -1 -2 -4 7 5 4 2
4 1 -1 -2 -4 7 5 4 2
Seems you must not use the xs
-function with more than a single key. It might be that there exists a fancier slicing, but I would keep it as simple as possible and produce a partial multiindex object that fits my needs:
cols = df.columns
thirdlvl = cols.get_level_values('third')
partialcols = [col for col, third in zip(cols, thirdlvl) if third in [2,3]]
With these columns, you get the partial data frame you want:
print df[partialcolumns]
first A B
second a b a b
third 2 3 2 3 2 3 2 3
0 1.103063 1.036151 -0.018996 1.436792 -0.956119 1.587688 2.262837 -1.059619
1 0.950664 1.847895 -1.172043 0.752676 -0.091956 -0.431509 -0.653317 -0.545843
2 0.165655 -0.180710 -1.844222 -0.836338 1.687806 -0.469707 -0.374222 0.132809
3 -0.275194 0.141292 1.021046 -0.010747 1.725614 0.530589 0.106327 0.138661
4 0.371840 0.455063 -2.643567 0.406322 -0.717277 0.667969 0.660701 -1.324643
EDIT: The simple piece of code below will also find the right columns, of course
partialcols = [col for col in cols if col[2] in [2,3]]