I have a dataframe which Im loading from a csv file and then setting the index to few of its columns (usually two or three) by the set_index
method. The idea is to
Its not really clear what you are asking. Multi-index docs are here
The OP needs to set the index, then sort in place
df.set_index(['fileName','phrase'],inplace=True)
df.sortlevel(inplace=True)
Then access these levels via a tuple to get a specific result
df.ix[('somePath','somePhrase')]
Maybe just give a toy example like this and show I want to get a specific result.
In [1]: arrays = [np.array(['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'])
...: .....: ,
...: .....: np.array(['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two'])
...: .....: ]
In [2]: df = DataFrame(randn(8, 4), index=arrays)
In [3]: df
Out[3]:
0 1 2 3
bar one 1.654436 0.184326 -2.337694 0.625120
two 0.308995 1.219156 -0.906315 1.555925
baz one -0.180826 -1.951569 1.617950 -1.401658
two 0.399151 -1.305852 1.530370 -0.132802
foo one 1.097562 0.097126 0.387418 0.106769
two 0.465681 0.270120 -0.387639 -0.142705
qux one -0.656487 -0.154881 0.495044 -1.380583
two 0.274045 -0.070566 1.274355 1.172247
In [4]: df.index.lexsort_depth
Out[4]: 2
In [5]: df.ix[('foo','one')]
Out[5]:
0 1.097562
1 0.097126
2 0.387418
3 0.106769
Name: (foo, one), dtype: float64
In [6]: df.ix['foo']
Out[6]:
0 1 2 3
one 1.097562 0.097126 0.387418 0.106769
two 0.465681 0.270120 -0.387639 -0.142705
In [7]: df.ix[['foo']]
Out[7]:
0 1 2 3
foo one 1.097562 0.097126 0.387418 0.106769
two 0.465681 0.270120 -0.387639 -0.142705
In [8]: df.sortlevel(level=1)
Out[8]:
0 1 2 3
bar one 1.654436 0.184326 -2.337694 0.625120
baz one -0.180826 -1.951569 1.617950 -1.401658
foo one 1.097562 0.097126 0.387418 0.106769
qux one -0.656487 -0.154881 0.495044 -1.380583
bar two 0.308995 1.219156 -0.906315 1.555925
baz two 0.399151 -1.305852 1.530370 -0.132802
foo two 0.465681 0.270120 -0.387639 -0.142705
qux two 0.274045 -0.070566 1.274355 1.172247
In [10]: df.sortlevel(level=1).index.lexsort_depth
Out[10]: 0
Pandas provides:
d = d.sort_index()
print d.index.is_lexsorted() # Sometimes true
which will do what you want in most cases. However, always sort the index, but may be leave it 'lexsorted' (for example, if you have NANs in the index), which generates a PerformanceWarning.
To avoid this:
d = d.sort_index(level=d.index.names)
print d.index.is_lexsorted() # true
... though why there's a difference doesn't seem to be documented.
I realize some time has passed but I seem to have had the same problem as @idoda did, with the accepted answer not working on MultiIndex dataframes when the dataframes may have multiple indexes on both the columns and index. The trick, not currently shown here, is that there is an "axis" option which defaults to zero but can also be set to 1.
For example if you try:
df.sortlevel(inplace=True,sort_remaining=True)
And are still getting lexsort errors it may be relevant to know that their is a default "axis=0" kwarg in there. Thus you can also try adding
df.sortlevel(axis=1,inplace=True,sort_remaining=True)
Which should sort the other direction. If you don't want to think about it, you can just brute force it with:
df.sortlevel(axis=0,inplace=True,sort_remaining=True)
df.sortlevel(axis=1,inplace=True,sort_remaining=True)
That should fully sort both columns and row indexes at all levels. I had the same problem here and couldn't get a full lexsort with the suggested answer but a bit of research showed that even with "sort_remaining" True the sortlevel applies to only a single axis. These snippets are the solution to that which appear to be the current pythonic native answer. Hope somebody finds it helpful!