I am importing a CSV file like the one below, using pandas.read_csv
:
df = pd.read_csv(Input, delimiter=\";\")
Example of CSV f
I think the earlier mentioned answer of including decimal=","
in pandas read_csv is the preferred option.
However, I found it is incompatible with the Python parsing engine. e.g. when using skiprow=
, read_csv will fall back to this engine and thus you can't use skiprow=
and decimal=
in the same read_csv statement as far as I know. Also, I haven't been able to actually get the decimal=
statement to work (probably due to me though)
The long way round I used to achieving the same result is with list comprehensions, .replace
and .astype
. The major downside to this method is that it needs to be done one column at a time:
df = pd.DataFrame({'a': ['120,00', '42,00', '18,00', '23,00'],
'b': ['51,23', '18,45', '28,90', '133,00']})
df['a'] = [x.replace(',', '.') for x in df['a']]
df['a'] = df['a'].astype(float)
Now, column a will have float type cells. Column b still contains strings.
Note that the .replace
used here is not pandas' but rather Python's built-in version. Pandas' version requires the string to be an exact match or a regex.
pandas.read_csv
has a decimal
parameter for this: doc
I.e. try with:
df = pd.read_csv(Input, delimiter=";", decimal=",")
I answer to the question about how to change the decimal comma
to the decimal dot
with Python Pandas.
$ cat test.py
import pandas as pd
df = pd.read_csv("test.csv", quotechar='"', decimal=",")
df.to_csv("test2.csv", sep=',', encoding='utf-8', quotechar='"', decimal='.')
where we specify the reading in decimal separator as comma while the output separator is specified as dot. So
$ cat test.csv
header,header2
1,"2,1"
3,"4,0"
$ cat test2.csv
,header,header2
0,1,2.1
1,3,4.0
where you see that the separator has changed to dot.