I want to be able to create a Pandas DataFrame
with MultiIndexes for the rows and the columns index and read it from an ASCII text file. My data looks like:
You can change the print options using set_option:
display.multi_sparse
:
: boolean
DefaultTrue
, "sparsify"MultiIndex
display
(don't display repeated elements in outer levels within groups)
Now the DataFrame will be printed as desired:
In [11]: pd.set_option('multi_sparse', False)
In [12]: df
Out[12]:
one A A A A A A A A A A2 A2 A2 A2 A2 A2 A2 A2 A2
two B B B B2 B2 B2 B3 B3 B3 B B B B2 B2 B2 B3 B3 B3
three C C2 C3 C C2 C3 C C2 C3 C C2 C3 C C2 C3 C C2 C3
n location sex
0 North M 2 1 6 4 6 4 7 1 1 0 4 3 9 2 0 0 6 4
1 East F 3 5 5 6 4 8 0 3 2 3 9 8 1 6 7 4 7 2
2 West M 7 9 3 5 0 1 2 8 1 6 0 7 9 9 3 2 2 4
3 South M 1 0 0 3 5 7 7 0 9 3 0 3 3 6 8 3 6 1
4 South F 8 0 0 7 3 8 0 8 0 5 5 6 0 0 0 1 8 7
5 West F 6 5 9 4 7 2 5 6 1 2 9 4 7 5 5 4 3 6
6 North M 3 3 0 1 1 3 6 3 8 6 4 1 0 5 5 5 4 9
7 North M 0 4 9 8 5 7 7 0 5 8 4 1 5 7 6 3 6 8
8 East F 5 6 2 7 0 6 2 7 1 2 0 5 6 1 4 8 0 3
9 South M 1 2 0 6 9 7 5 3 3 8 7 6 0 5 4 3 5 9
Note: in older pandas versions this was pd.set_printoptions(multi_sparse=False)
.
Not sure which version of pandas you are using but with 0.7.3
you can export your DataFrame
to a TSV file and retain the indices by doing this:
df.to_csv('mydf.tsv', sep='\t')
The reason you need to export to TSV versus CSV is since the column headers have ,
characters in them. This should solve the first part of your question.
The second part gets a bit more tricky since from as far as I can tell, you need to beforehand have an idea of what you want your DataFrame to contain. In particular, you need to know:
MultiIndex
MultiIndex
To illustrate this, lets read back the TSV file we saved above into a new DataFrame
:
In [1]: t_df = read_table('mydf.tsv', index_col=[0,1,2])
In [2]: all(t_df.index == df.index)
Out[2]: True
So we managed to read mydf.tsv
into a DataFrame
that has the same row index as the original df
. But:
In [3]: all(t_df.columns == df.columns)
Out[3]: False
And the reason here is because pandas (as far as I can tell) has no way of parsing the header row correctly into a MultiIndex
. As I mentioned above, if you know beorehand that your TSV file header represents a MultiIndex
then you can do the following to fix this:
In [4]: from ast import literal_eval
In [5]: t_df.columns = MultiIndex.from_tuples(t_df.columns.map(literal_eval).tolist(),
names=['one','two','three'])
In [6]: all(t_df.columns == df.columns)
Out[6]: True