I've read the docs about slicers a million times, but have never got my head round it, so I'm still trying to figure out how to use loc
to slice a DataFrame
with a MultiIndex
.
I'll start with the DataFrame
from this SO answer:
value first second third fourth A0 B0 C1 D0 2 D1 3 C3 D0 6 D1 7 B1 C1 D0 10 D1 11 C3 D0 14 D1 15 A1 B0 C1 D0 18 D1 19 C3 D0 22 D1 23 B1 C1 D0 26 D1 27 C3 D0 30 D1 31 A2 B0 C1 D0 34 D1 35 C3 D0 38 D1 39 B1 C1 D0 42 D1 43 C3 D0 46 D1 47 A3 B0 C1 D0 50 D1 51 C3 D0 54 D1 55 B1 C1 D0 58 D1 59 C3 D0 62 D1 63
To select just A0
and C1
values, I can do:
In [26]: df.loc['A0', :, 'C1', :] Out[26]: value first second third fourth A0 B0 C1 D0 2 D1 3 B1 C1 D0 10 D1 11
Which also works selecting from three levels, and even with tuples:
In [28]: df.loc['A0', :, ('C1', 'C2'), 'D1'] Out[28]: value first second third fourth A0 B0 C1 D1 3 C2 D1 5 B1 C1 D1 11 C2 D1 13
So far, intuitive and brilliant.
So why can't I select all values from the first index level?
In [30]: df.loc[:, :, 'C1', :] --------------------------------------------------------------------------- IndexingError Traceback (most recent call last) in () ----> 1 df.loc[:, :, 'C1', :] /usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in __getitem__(self, key) 1176 def __getitem__(self, key): 1177 if type(key) is tuple: -> 1178 return self._getitem_tuple(key) 1179 else: 1180 return self._getitem_axis(key, axis=0) /usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _getitem_tuple(self, tup) 694 695 # no multi-index, so validate all of the indexers --> 696 self._has_valid_tuple(tup) 697 698 # ugly hack for GH #836 /usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _has_valid_tuple(self, key) 125 for i, k in enumerate(key): 126 if i >= self.obj.ndim: --> 127 raise IndexingError('Too many indexers') 128 if not self._has_valid_type(k, i): 129 raise ValueError("Location based indexing can only have [%s] " IndexingError: Too many indexers
Surely this is not intended behaviour?
Note: I know this is possible with df.xs('C1', level='third')
but the current .loc
behaviour seems inconsistent.