How to write/read a Pandas DataFrame with MultiIndex from/to an ASCII file?

前端 未结 2 1390
庸人自扰
庸人自扰 2021-02-14 10:25

I want to be able to create a Pandas DataFrame with MultiIndexes for the rows and the columns index and read it from an ASCII text file. My data looks like:

<
2条回答
  •  野性不改
    2021-02-14 10:53

    Not sure which version of pandas you are using but with 0.7.3 you can export your DataFrame to a TSV file and retain the indices by doing this:

    df.to_csv('mydf.tsv', sep='\t')
    

    The reason you need to export to TSV versus CSV is since the column headers have , characters in them. This should solve the first part of your question.

    The second part gets a bit more tricky since from as far as I can tell, you need to beforehand have an idea of what you want your DataFrame to contain. In particular, you need to know:

    1. Which columns on your TSV represent the row MultiIndex
    2. and that the rest of the columns should also be converted to a MultiIndex

    To illustrate this, lets read back the TSV file we saved above into a new DataFrame:

    In [1]: t_df = read_table('mydf.tsv', index_col=[0,1,2])
    In [2]: all(t_df.index == df.index)
    Out[2]: True
    

    So we managed to read mydf.tsv into a DataFrame that has the same row index as the original df. But:

    In [3]: all(t_df.columns == df.columns)
    Out[3]: False
    

    And the reason here is because pandas (as far as I can tell) has no way of parsing the header row correctly into a MultiIndex. As I mentioned above, if you know beorehand that your TSV file header represents a MultiIndex then you can do the following to fix this:

    In [4]: from ast import literal_eval
    In [5]: t_df.columns = MultiIndex.from_tuples(t_df.columns.map(literal_eval).tolist(), 
                                                  names=['one','two','three'])
    In [6]: all(t_df.columns == df.columns)
    Out[6]: True
    

提交回复
热议问题