I have a pandas.DataFrame
that I wish to export to a CSV file. However, pandas seems to write some of the values as float
instead of int
The problem is that since you are assigning things by rows, but dtypes are grouped by columns, so things get cast to object
dtype, which is not a good thing, you lose all efficiency. So one way is to convert which will coerce to float/int dtype as needed.
As we answered in another question, if you construct the frame all at once (or construct column by column) this step will not be needed
In [23]: def convert(x):
....: try:
....: return x.astype(int)
....: except:
....: return x
....:
In [24]: df.apply(convert)
Out[24]:
a b c d
x 10 10 NaN 10
y 1 5 2 3
z 1 2 3 4
In [25]: df.apply(convert).dtypes
Out[25]:
a int64
b int64
c float64
d int64
dtype: object
In [26]: df.apply(convert).to_csv('test.csv')
In [27]: !cat test.csv
,a,b,c,d
x,10,10,,10
y,1,5,2.0,3
z,1,2,3.0,4
The answer I was looking for was a slight variation of what @Jeff proposed in his answer. The credit goes to him. This is what solved my problem in the end for reference:
import pandas
df = pandas.DataFrame(data, columns=['a','b','c','d'], index=['x','y','z'])
df = df.fillna(0)
df = df.astype(int)
df.to_csv('test.csv', sep='\t')
You can use astype() to specify data type for each column
For example:
import pandas
df = pandas.DataFrame(data, columns=['a','b','c','d'], index=['x','y','z'])
df = df.astype({"a": int, "b": complex, "c" : float, "d" : int})
You can change your DataFrame into Numpy array as a workaround:
np.savetxt(savepath, np.array(df).astype(np.int), fmt='%i', delimiter = ',', header= 'PassengerId,Survived', comments='')
If you want to preserve NaN info in the csv which you have exported, then do the below. P.S : I'm concentrating on column 'C' in this case.
df[c] = df[c].fillna('') #filling Nan with empty string
df[c] = df[c].astype(str) #convert the column to string
>>> df
a b c d
x 10 10 10
y 1 5 2.0 3
z 1 2 3.0 4
df[c] = df[c].str.split('.') #split the float value into list based on '.'
>>> df
a b c d
x 10 10 [''] 10
y 1 5 ['2','0'] 3
z 1 2 ['3','0'] 4
df[c] = df[c].str[0] #select 1st element from the list
>>> df
a b c d
x 10 10 10
y 1 5 2 3
z 1 2 3 4
Now, if you export the dataframe to csv, Column 'c' will not have float values and the NaN info is preserved.
This is a "gotcha" in pandas (Support for integer NA), where integer columns with NaNs are converted to floats.
This trade-off is made largely for memory and performance reasons, and also so that the resulting Series continues to be “numeric”. One possibility is to use
dtype=object
arrays instead.