How do you update the levels of a pandas MultiIndex after slicing its DataFrame?

后端 未结 3 603
借酒劲吻你
借酒劲吻你 2020-11-27 05:55

I have a Dataframe with a pandas MultiIndex:

In [1]: import pandas as pd
In [2]: multi_index = pd.MultiIndex.from_product([[\'CAN\',\'USA\'],[\'total\']],nam         


        
相关标签:
3条回答
  • 2020-11-27 06:27

    From version pandas 0.20.0+ use MultiIndex.remove_unused_levels:

    print (df.index)
    MultiIndex(levels=[['CAN', 'USA'], ['total']],
               labels=[[1], [0]],
               names=['country', 'sex'])
    
    df.index = df.index.remove_unused_levels()
    
    print (df.index)
    MultiIndex(levels=[['USA'], ['total']],
               labels=[[0], [0]],
               names=['country', 'sex'])
    
    0 讨论(0)
  • 2020-11-27 06:35

    I will be surprised if there is a more "built-in" way to eliminate the unused country than to re-create the index in the way you're doing (or some similar way). If you look at your index before and after the slice:

    In [165]: df.index
    Out[165]:
    MultiIndex(levels=[[u'CAN', u'USA'], [u'total']],
               labels=[[0, 1], [0, 0]],
               names=[u'country', u'sex'])
    
    In [166]: df = df.query('pop > 100')
    
    In [167]: df.index
    Out[167]:
    MultiIndex(levels=[[u'CAN', u'USA'], [u'total']],
               labels=[[1], [0]],
               names=[u'country', u'sex'])
    

    you can see that the labels - which are indexes into the level values - have updated but not the level values. This may be an imperfect analogy, but it strikes me that the level values are analogous to an enumerated column in a database table, while the labels are analogous to the actual values of rows in the table. If you delete all the rows in a table with a value of "CAN", it doesn't change the fact that "CAN" is still a valid choice based on the column definition. To remove "CAN" from the enumeration, you have to alter the column definition; this is the equivalent of reindexing the dataframe in pandas.

    0 讨论(0)
  • 2020-11-27 06:45

    This is something that has bitten me before. Dropping columns or rows does NOT change the underlying MultiIndex, for performance and philosophical reasons, and this is officially not considered a bug (read more here). The short answer is that the developers say "that's not what the MultiIndex is for". If you need a list of the contents of a MultiIndex level after modification, for example for iteration or to check to see if something is included, you can use:

    df.index.get_level_values(<levelname>)
    

    This returns the current active values within that index level.

    So I guess the "trick" here is that the API native way to do it is to use get_level_values instead of just .index or .columns

    0 讨论(0)
提交回复
热议问题