How to slice one MultiIndex DataFrame with the MultiIndex of another

前端 未结 3 1465
半阙折子戏
半阙折子戏 2020-12-05 19:27

I have a pandas dataframe with 3 levels of a MultiIndex. I am trying to pull out rows of this dataframe according to a list of values that correspond to two of the levels.

相关标签:
3条回答
  • 2020-12-05 19:44

    Here is a way to get this slice:

    df.sort_index(inplace=True)
    idx = pd.IndexSlice
    df.loc[idx[:, ('foo','bar'), 'can'], :]
    

    yielding

               hi
    a b   c      
    1 bar can   3
      foo can   1
    2 bar can   7
      foo can   5
    3 bar can  11
      foo can   9
    

    Note that you might need to sort MultiIndex before you can slice it. Well pandas is kind enough to warn if you need to do it:

    KeyError: 'MultiIndex Slicing requires the index to be fully lexsorted tuple len (3), lexsort depth (1)'
    

    You can read more on how to use slicers in the docs

    If for some reason using slicers is not an option here is a way to get the same slice using .isin() method:

    df[df.index.get_level_values('b').isin(ix_use.get_level_values(0)) & df.index.get_level_values('c').isin(ix_use.get_level_values(1))]
    

    Which is clearly not as concise.

    UPDATE:

    For the conditions that you have updated here is a way to do it:

    cond1 = (df.index.get_level_values('b').isin(['foo'])) & (df.index.get_level_values('c').isin(['can']))
    cond2 = (df.index.get_level_values('b').isin(['bar'])) & (df.index.get_level_values('c').isin(['baz']))
    df[cond1 | cond2]
    

    producing:

               hi
    a b   c      
    1 foo can   1
      bar baz   2
    2 foo can   5
      bar baz   6
    3 foo can   9
      bar baz  10
    
    0 讨论(0)
  • 2020-12-05 19:50

    I find it interesting that this doesn't work:

    In [45]: df.loc[(idx[:, 'foo', 'can'], idx[:, 'bar', 'baz']), ]
    Out[45]: 
               hi
    a b   c      
    1 bar baz   2
          can   3
      foo baz   0
          can   1
    2 bar baz   6
          can   7
      foo baz   4
          can   5
    3 bar baz  10
          can  11
      foo baz   8
          can   9
    

    It sort of looks like it "should", somehow. In any case, here's a reasonable workaround:

    Let's assume the tuples you want to slice by are in the index of another DataFrame (since it sounds like they probably are in your case!).

    In [53]: ix_use = pd.MultiIndex.from_tuples([('foo', 'can'), ('bar', 'baz')], names=['b', 'c'])
    In [55]: other = pd.DataFrame(dict(a=1), index=ix_use)
    In [56]: other
    Out[56]: 
             a
    b   c     
    foo can  1
    bar baz  1
    

    Now to slice df by the index of other we can use the fact that .loc/.ix allow you to give a list of tuples (see the last example here).

    First let's build the list of tuples we want:

    In [13]: idx = [(x, ) + y for x in df.index.levels[0] for y in other.index.values]
    In [14]: idx
    Out[14]: 
    [(1, 'foo', 'can'),
     (1, 'bar', 'baz'),
     (2, 'foo', 'can'),
     (2, 'bar', 'baz'),
     (3, 'foo', 'can'),
     (3, 'bar', 'baz')]
    

    Now we can pass this list to .ix or .loc:

    In [17]: df.ix[idx]
    Out[17]: 
               hi
    a b   c      
    1 foo can   1
      bar baz   2
    2 foo can   5
      bar baz   6
    3 foo can   9
      bar baz  10
    
    0 讨论(0)
  • 2020-12-05 19:53

    I would recommend the query() method just like in this Q&A.

    Simply using this, which I think is a more natural way to express:

    In [27]: df.query("(b == 'foo' and c == 'can') or (b == 'bar' and c == 'baz')")
    Out[27]: 
               hi
    a b   c      
    1 foo can   1
      bar baz   2
    2 foo can   5
      bar baz   6
    3 foo can   9
      bar baz  10
    
    0 讨论(0)
提交回复
热议问题