How to filter by sub-level index in Pandas

后端 未结 1 384
礼貌的吻别
礼貌的吻别 2020-12-30 14:04

I have a \'df\' which have a multilevel index (STK_ID,RPT_Date)

                       sales         cogs     net_pft
STK_ID RPT_Date                                 


        
相关标签:
1条回答
  • 2020-12-30 14:45

    To use the "str.*" methods on a column, you could reset the index, filter rows with a column "str.*" method call, and re-create the index.

    In [72]: x = df.reset_index(); x[x.RPT_Date.str.endswith("0630")].set_index(['STK_ID', 'RPT_Date'])
    Out[72]: 
                          sales        cogs    net_pft
    STK_ID RPT_Date                                   
    000876 20060630   857483000   729541000   67157200
           20070630  1146245000  1050808000  113468500
           20080630  1932470000  1777010000  133756300
    002254 20070630   501221000   289167000  118012200
    

    However, this approach is not particularly fast.

    In [73]: timeit x = df.reset_index(); x[x.RPT_Date.str.endswith("0630")].set_index(['STK_ID', 'RPT_Date'])
    1000 loops, best of 3: 1.78 ms per loop
    

    Another approach builds on the fact that a MultiIndex object behaves much like a list of tuples.

    In [75]: df.index
    Out[75]: 
    MultiIndex
    [('000876', '20060331') ('000876', '20060630') ('000876', '20060930')
     ('000876', '20061231') ('000876', '20070331') ('000876', '20070630')
     ('000876', '20070930') ('000876', '20071231') ('000876', '20080331')
     ('000876', '20080630') ('000876', '20080930') ('002254', '20061231')
     ('002254', '20070331') ('002254', '20070630') ('002254', '20070930')]
    

    Building on that, you can create a boolean array from a MultiIndex with df.index.map() and use the result to filter the frame.

    In [76]: df[df.index.map(lambda x: x[1].endswith("0630"))]
    Out[76]: 
                          sales        cogs    net_pft
    STK_ID RPT_Date                                   
    000876 20060630   857483000   729541000   67157200
           20070630  1146245000  1050808000  113468500
           20080630  1932470000  1777010000  133756300
    002254 20070630   501221000   289167000  118012200
    

    This is also quite a bit faster.

    In [77]: timeit df[df.index.map(lambda x: x[1].endswith("0630"))]
    1000 loops, best of 3: 240 us per loop
    
    0 讨论(0)
提交回复
热议问题