Select rows in pandas MultiIndex DataFrame

前端 未结 2 1645
清歌不尽
清歌不尽 2020-11-22 02:58

What are the most common pandas ways to select/filter rows of a dataframe whose index is a MultiIndex?

  • Slicing based on a single value/label
2条回答
  •  别那么骄傲
    2020-11-22 03:57

    Recently I came across a use case where I had a 3+ level multi-index dataframe in which I couldn't make any of the solutions above produce the results I was looking for. It's quite possible that the above solutions do of course work for my use case, and I tried several, however I was unable to get them to work with the time I had available.

    I am far from expert, but I stumbled across a solution that was not listed in the comprehensive answers above. I offer no guarantee that the solutions are in any way optimal.

    This is a different way to get a slightly different result to Question #6 above. (and likely other questions as well)

    Specifically I was looking for:

    1. A way to choose two+ values from one level of the index and a single value from another level of the index, and
    2. A way to leave the index values from the previous operation in the dataframe output.

    As a monkey wrench in the gears (however totally fixable):

    1. The indexes were unnamed.

    On the toy dataframe below:

        index = pd.MultiIndex.from_product([['a','b'],
                                   ['stock1','stock2','stock3'],
                                   ['price','volume','velocity']])
    
        df = pd.DataFrame([1,2,3,4,5,6,7,8,9,
                          10,11,12,13,14,15,16,17,18], 
                           index)
    
                            0
        a stock1 price      1
                 volume     2
                 velocity   3
          stock2 price      4
                 volume     5
                 velocity   6
          stock3 price      7
                 volume     8
                 velocity   9
        b stock1 price     10
                 volume    11
                 velocity  12
          stock2 price     13
                 volume    14
                 velocity  15
          stock3 price     16
                 volume    17
                 velocity  18
    

    Using the below works, of course:

        df.xs(('stock1', 'velocity'), level=(1,2))
    
            0
        a   3
        b  12
    

    But I wanted a different result, so my method to get that result was:

       df.iloc[df.index.isin(['stock1'], level=1) & 
               df.index.isin(['velocity'], level=2)] 
    
                            0
        a stock1 velocity   3
        b stock1 velocity  12
    

    And if I wanted two+ values from one level and a single (or 2+) value from another level:

        df.iloc[df.index.isin(['stock1','stock3'], level=1) & 
                df.index.isin(['velocity'], level=2)] 
    
                            0
        a stock1 velocity   3
          stock3 velocity   9
        b stock1 velocity  12
          stock3 velocity  18
    

    The above method is probably a bit clunky, however I found it filled my needs and as a bonus was easier for me to understand and read.

提交回复
热议问题