“Too many indexers” with DataFrame.loc

匿名 (未验证) 提交于 2019-12-03 01:47:02

问题:

I've read the docs about slicers a million times, but have never got my head round it, so I'm still trying to figure out how to use loc to slice a DataFrame with a MultiIndex.

I'll start with the DataFrame from this SO answer:

                           value first second third fourth        A0    B0     C1    D0          2                    D1          3              C3    D0          6                    D1          7       B1     C1    D0         10                    D1         11              C3    D0         14                    D1         15 A1    B0     C1    D0         18                    D1         19              C3    D0         22                    D1         23       B1     C1    D0         26                    D1         27              C3    D0         30                    D1         31 A2    B0     C1    D0         34                    D1         35              C3    D0         38                    D1         39       B1     C1    D0         42                    D1         43              C3    D0         46                    D1         47 A3    B0     C1    D0         50                    D1         51              C3    D0         54                    D1         55       B1     C1    D0         58                    D1         59              C3    D0         62                    D1         63

To select just A0 and C1 values, I can do:

In [26]: df.loc['A0', :, 'C1', :] Out[26]:                             value first second third fourth        A0    B0     C1    D0          2                    D1          3       B1     C1    D0         10                    D1         11

Which also works selecting from three levels, and even with tuples:

In [28]: df.loc['A0', :, ('C1', 'C2'), 'D1'] Out[28]:                             value first second third fourth        A0    B0     C1    D1          3              C2    D1          5       B1     C1    D1         11              C2    D1         13

So far, intuitive and brilliant.

So why can't I select all values from the first index level?

In [30]: df.loc[:, :, 'C1', :] --------------------------------------------------------------------------- IndexingError                             Traceback (most recent call last)  in () ----> 1 df.loc[:, :, 'C1', :]  /usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in __getitem__(self, key)    1176     def __getitem__(self, key):    1177         if type(key) is tuple: -> 1178             return self._getitem_tuple(key)    1179         else:    1180             return self._getitem_axis(key, axis=0)  /usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _getitem_tuple(self, tup)     694      695         # no multi-index, so validate all of the indexers --> 696         self._has_valid_tuple(tup)     697      698         # ugly hack for GH #836  /usr/local/lib/python2.7/dist-packages/pandas/core/indexing.pyc in _has_valid_tuple(self, key)     125         for i, k in enumerate(key):     126             if i >= self.obj.ndim: --> 127                 raise IndexingError('Too many indexers')     128             if not self._has_valid_type(k, i):     129                 raise ValueError("Location based indexing can only have [%s] "  IndexingError: Too many indexers

Surely this is not intended behaviour?

Note: I know this is possible with df.xs('C1', level='third') but the current .loc behaviour seems inconsistent.

回答1:

To be safe (in the sense: this will work in all cases), you need to index both row index and columns, for which you can use pd.IndexSlice to do this easily:

In [26]: idx = pd.IndexSlice  In [27]: df.loc[idx[:, :, 'C1', :],:] Out[27]:                            value first second third fourth A0    B0     C1    D0          2                    D1          3       B1     C1    D0         10                    D1         11 A1    B0     C1    D0         18                    D1         19       B1     C1    D0         26                    D1         27 A2    B0     C1    D0         34                    D1         35       B1     C1    D0         42                    D1         43 A3    B0     C1    D0         50                    D1         51       B1     C1    D0         58                    D1         59

Here idx[:, :, 'C1', :] is an easier way to write [slice(None), slice(None),'C1', slice(None)]. Instead of pd.IndexSlice, you can also use np.s_ which is a bit shorter.

The reason that the other ones work, I am not fully sure of. But see the note in the documentation here: http://pandas.pydata.org/pandas-docs/stable/advanced.html#using-slicers (the first red warning box) where it is stated that:

You should specify all axes in the .loc specifier, meaning the indexer for the index and for the columns. Their are some ambiguous cases where the passed indexer could be mis-interpreted as indexing both axes, rather than into say the MuliIndex for the rows.



回答2:

The reason this doesn't work is tied to the need to specify the axis of indexing (mentioned in http://pandas.pydata.org/pandas-docs/stable/advanced.html). An alternative solution to your problem is to simply do this:

df.loc(axis=0)[:, :, 'C1', :]

Pandas gets confused sometimes when indexes are similar or contain similar values. If you were to have a column named 'C1' or something you would also need to do this under this style of slicing/selecting.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!