I have a Dataframe with a pandas MultiIndex:
In [1]: import pandas as pd
In [2]: multi_index = pd.MultiIndex.from_product([[\'CAN\',\'USA\'],[\'total\']],nam
From version pandas 0.20.0+ use MultiIndex.remove_unused_levels:
print (df.index)
MultiIndex(levels=[['CAN', 'USA'], ['total']],
labels=[[1], [0]],
names=['country', 'sex'])
df.index = df.index.remove_unused_levels()
print (df.index)
MultiIndex(levels=[['USA'], ['total']],
labels=[[0], [0]],
names=['country', 'sex'])
I will be surprised if there is a more "built-in" way to eliminate the unused country than to re-create the index in the way you're doing (or some similar way). If you look at your index before and after the slice:
In [165]: df.index
Out[165]:
MultiIndex(levels=[[u'CAN', u'USA'], [u'total']],
labels=[[0, 1], [0, 0]],
names=[u'country', u'sex'])
In [166]: df = df.query('pop > 100')
In [167]: df.index
Out[167]:
MultiIndex(levels=[[u'CAN', u'USA'], [u'total']],
labels=[[1], [0]],
names=[u'country', u'sex'])
you can see that the labels - which are indexes into the level values - have updated but not the level values. This may be an imperfect analogy, but it strikes me that the level values are analogous to an enumerated column in a database table, while the labels are analogous to the actual values of rows in the table. If you delete all the rows in a table with a value of "CAN", it doesn't change the fact that "CAN" is still a valid choice based on the column definition. To remove "CAN" from the enumeration, you have to alter the column definition; this is the equivalent of reindexing the dataframe in pandas.
This is something that has bitten me before. Dropping columns or rows does NOT change the underlying MultiIndex, for performance and philosophical reasons, and this is officially not considered a bug (read more here). The short answer is that the developers say "that's not what the MultiIndex is for". If you need a list of the contents of a MultiIndex level after modification, for example for iteration or to check to see if something is included, you can use:
df.index.get_level_values(<levelname>)
This returns the current active values within that index level.
So I guess the "trick" here is that the API native way to do it is to use get_level_values instead of just .index or .columns