Sorting Multi-Index to full depth (Pandas)

前端 未结 3 1859
暖寄归人
暖寄归人 2021-02-07 08:19

I have a dataframe which Im loading from a csv file and then setting the index to few of its columns (usually two or three) by the set_index method. The idea is to

相关标签:
3条回答
  • Its not really clear what you are asking. Multi-index docs are here

    The OP needs to set the index, then sort in place

    df.set_index(['fileName','phrase'],inplace=True)
    df.sortlevel(inplace=True)
    

    Then access these levels via a tuple to get a specific result

    df.ix[('somePath','somePhrase')]
    

    Maybe just give a toy example like this and show I want to get a specific result.

    In [1]: arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'])
       ...:    .....: ,
       ...:    .....:           np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])
       ...:    .....:           ]
    
    In [2]: df = DataFrame(randn(8, 4), index=arrays)
    
    In [3]: df
    Out[3]: 
                    0         1         2         3
    bar one  1.654436  0.184326 -2.337694  0.625120
        two  0.308995  1.219156 -0.906315  1.555925
    baz one -0.180826 -1.951569  1.617950 -1.401658
        two  0.399151 -1.305852  1.530370 -0.132802
    foo one  1.097562  0.097126  0.387418  0.106769
        two  0.465681  0.270120 -0.387639 -0.142705
    qux one -0.656487 -0.154881  0.495044 -1.380583
        two  0.274045 -0.070566  1.274355  1.172247
    
    In [4]: df.index.lexsort_depth
    Out[4]: 2
    
    In [5]: df.ix[('foo','one')]
    Out[5]: 
    0    1.097562
    1    0.097126
    2    0.387418
    3    0.106769
    Name: (foo, one), dtype: float64
    
    In [6]: df.ix['foo']
    Out[6]: 
                0         1         2         3
    one  1.097562  0.097126  0.387418  0.106769
    two  0.465681  0.270120 -0.387639 -0.142705
    
    In [7]: df.ix[['foo']]
    Out[7]: 
                    0         1         2         3
    foo one  1.097562  0.097126  0.387418  0.106769
        two  0.465681  0.270120 -0.387639 -0.142705
    
    In [8]: df.sortlevel(level=1)
    Out[8]: 
                    0         1         2         3
    bar one  1.654436  0.184326 -2.337694  0.625120
    baz one -0.180826 -1.951569  1.617950 -1.401658
    foo one  1.097562  0.097126  0.387418  0.106769
    qux one -0.656487 -0.154881  0.495044 -1.380583
    bar two  0.308995  1.219156 -0.906315  1.555925
    baz two  0.399151 -1.305852  1.530370 -0.132802
    foo two  0.465681  0.270120 -0.387639 -0.142705
    qux two  0.274045 -0.070566  1.274355  1.172247
    
    In [10]: df.sortlevel(level=1).index.lexsort_depth
    Out[10]: 0
    
    0 讨论(0)
  • 2021-02-07 08:55

    Pandas provides:

    d = d.sort_index()
    print d.index.is_lexsorted() # Sometimes true
    

    which will do what you want in most cases. However, always sort the index, but may be leave it 'lexsorted' (for example, if you have NANs in the index), which generates a PerformanceWarning.

    To avoid this:

    d = d.sort_index(level=d.index.names)
    print d.index.is_lexsorted() #  true
    

    ... though why there's a difference doesn't seem to be documented.

    0 讨论(0)
  • 2021-02-07 09:07

    I realize some time has passed but I seem to have had the same problem as @idoda did, with the accepted answer not working on MultiIndex dataframes when the dataframes may have multiple indexes on both the columns and index. The trick, not currently shown here, is that there is an "axis" option which defaults to zero but can also be set to 1.

    For example if you try:

    df.sortlevel(inplace=True,sort_remaining=True)
    

    And are still getting lexsort errors it may be relevant to know that their is a default "axis=0" kwarg in there. Thus you can also try adding

    df.sortlevel(axis=1,inplace=True,sort_remaining=True)
    

    Which should sort the other direction. If you don't want to think about it, you can just brute force it with:

    df.sortlevel(axis=0,inplace=True,sort_remaining=True)
    df.sortlevel(axis=1,inplace=True,sort_remaining=True)
    

    That should fully sort both columns and row indexes at all levels. I had the same problem here and couldn't get a full lexsort with the suggested answer but a bit of research showed that even with "sort_remaining" True the sortlevel applies to only a single axis. These snippets are the solution to that which appear to be the current pythonic native answer. Hope somebody finds it helpful!

    0 讨论(0)
提交回复
热议问题