Python pandas: output dataframe to csv with integers

后端 未结 6 675
囚心锁ツ
囚心锁ツ 2020-12-08 02:15

I have a pandas.DataFrame that I wish to export to a CSV file. However, pandas seems to write some of the values as float instead of int

相关标签:
6条回答
  • 2020-12-08 02:50

    The problem is that since you are assigning things by rows, but dtypes are grouped by columns, so things get cast to object dtype, which is not a good thing, you lose all efficiency. So one way is to convert which will coerce to float/int dtype as needed.

    As we answered in another question, if you construct the frame all at once (or construct column by column) this step will not be needed

    In [23]: def convert(x):
       ....:     try:
       ....:         return x.astype(int)
       ....:     except:
       ....:         return x
       ....:     
    
    In [24]: df.apply(convert)
    Out[24]: 
        a   b   c   d
    x  10  10 NaN  10
    y   1   5   2   3
    z   1   2   3   4
    
    In [25]: df.apply(convert).dtypes
    Out[25]: 
    a      int64
    b      int64
    c    float64
    d      int64
    dtype: object
    
    In [26]: df.apply(convert).to_csv('test.csv')
    
    In [27]: !cat test.csv
    ,a,b,c,d
    x,10,10,,10
    y,1,5,2.0,3
    z,1,2,3.0,4
    
    0 讨论(0)
  • 2020-12-08 02:52

    The answer I was looking for was a slight variation of what @Jeff proposed in his answer. The credit goes to him. This is what solved my problem in the end for reference:

        import pandas
        df = pandas.DataFrame(data, columns=['a','b','c','d'], index=['x','y','z'])
        df = df.fillna(0)
        df = df.astype(int)
        df.to_csv('test.csv', sep='\t')
    
    0 讨论(0)
  • 2020-12-08 02:53

    You can use astype() to specify data type for each column

    For example:

    import pandas
    df = pandas.DataFrame(data, columns=['a','b','c','d'], index=['x','y','z'])
    
    df = df.astype({"a": int, "b": complex, "c" : float, "d" : int})
    
    0 讨论(0)
  • 2020-12-08 02:53

    You can change your DataFrame into Numpy array as a workaround:

     np.savetxt(savepath, np.array(df).astype(np.int), fmt='%i', delimiter = ',', header= 'PassengerId,Survived', comments='')
    
    0 讨论(0)
  • 2020-12-08 02:57

    If you want to preserve NaN info in the csv which you have exported, then do the below. P.S : I'm concentrating on column 'C' in this case.

    df[c] = df[c].fillna('')       #filling Nan with empty string
    df[c] = df[c].astype(str)      #convert the column to string 
    >>> df
        a   b    c     d
    x  10  10         10
    y   1   5    2.0   3
    z   1   2    3.0   4
    
    df[c] = df[c].str.split('.')   #split the float value into list based on '.'
    >>> df
            a   b    c          d
        x  10  10   ['']       10
        y   1   5   ['2','0']   3
        z   1   2   ['3','0']   4
    
    df[c] = df[c].str[0]            #select 1st element from the list
    >>> df
        a   b    c   d
    x  10  10       10
    y   1   5    2   3
    z   1   2    3   4
    

    Now, if you export the dataframe to csv, Column 'c' will not have float values and the NaN info is preserved.

    0 讨论(0)
  • 2020-12-08 03:09

    This is a "gotcha" in pandas (Support for integer NA), where integer columns with NaNs are converted to floats.

    This trade-off is made largely for memory and performance reasons, and also so that the resulting Series continues to be “numeric”. One possibility is to use dtype=object arrays instead.

    0 讨论(0)
提交回复
热议问题