Preserving column order in Python Pandas DataFrame

前端 未结 3 2011
南笙
南笙 2020-12-03 06:22

Is there a way to preserve the order of the columns in a csv file when read and the write with Python Pandas? For example, in this code

import pandas as pd

         


        
相关标签:
3条回答
  • 2020-12-03 07:06

    Another workaround is to do this:

    import pandas as pd
    data = pd.read_csv(filename)
    data2 = df[['A','B','C']]  #put 'A' 'B' 'C' in the desired order
    data2.to_csv(filename)
    
    0 讨论(0)
  • 2020-12-03 07:20

    The column order should generally be preserved when reading and then writing a csv file like that, but if for some reason they are not in the order you want you can use the columns keyword argument in to_csv.

    For example, if you have a csv with columns a, b, c, d:

    data = pd.read_csv(filename)
    data.to_csv(filename, columns=['a', 'b', 'c', 'd'])
    
    0 讨论(0)
  • 2020-12-03 07:22

    There appears to be a bug in the current version of Pandas ('0.11.0'), which means that Matti John's answer will not work. If you specify columns for writing to file, they are written in alphabetical order, but simply relabelled according to the list in cols. For example, this code:

    import pandas
    dfdict={}
    dfdict["a"]=[1,2,3,4]
    dfdict["b"]=[5,6,7,8]
    dfdict["c"]=[9,10,11,12]
    df=pandas.DataFrame(dfdict)
    df.to_csv("dfTest.txt","\t",header=True,cols=["b","a","c"])
    

    results in this (incorrect) output:

        b   a   c
    0   1   5   9
    1   2   6   10
    2   3   7   11
    3   4   8   12
    

    You can check which version of pandas you have installed by executing:

    pandas.version.version
    

    Documentation for to_csv is here

    Actually, it seems that this is a known bug and will be fixed in an upcoming release (0.11.1):

    https://github.com/pydata/pandas/issues/3489

    UPDATE: There still hasn't been a new release of pandas, but there is a workaround described here, which doesn't require using a different version of pandas:

    github.com/pydata/pandas/issues/3454

    So changing the last line in the block of code above to the following will work correctly:

    df.to_csv("dfTest.txt","\t",header=True,cols=["b","a","c"], engine='python')
    

    UPDATE it seems that the argument "cols" has been renamed to "columns" and that the argument "engine" is deprecated (no longer available) in recent versions of pandas. Also, this bug is fixed in version 0.19.0.

    0 讨论(0)
提交回复
热议问题