Is there a way to convert values like \'34%\' directly to int or float when using read_csv in pandas? I would like that it is directly read as 0.34.
Using this in r
You were very close with your df
attempt. Try changing:
df['col'] = df['col'].astype(float)
to:
df['col'] = df['col'].str.rstrip('%').astype('float') / 100.0
# ^ use str funcs to elim '%' ^ divide by 100
# could also be: .str[:-1].astype(...
Pandas supports Python's string processing ability. Just precede the string function you want with .str
and see if it does what you need. (This includes string slicing, too, of course.)
Above we utilize .str.rstrip()
to get rid of the trailing percent sign, then we divide the array in its entirety by 100.0 to convert from percentage to actual value. For example, 45% is equivalent to 0.45.
Although .str.rstrip('%')
could also just be .str[:-1]
, I prefer to explicitly remove the '%' rather than blindly removing the last char, just in case...
You can define a custom function to convert your percents to floats
In [149]:
# dummy data
temp1 = """index col
113 34%
122 50%
123 32%
301 12%"""
# custom function taken from https://stackoverflow.com/questions/12432663/what-is-a-clean-way-to-convert-a-string-percent-to-a-float
def p2f(x):
return float(x.strip('%'))/100
# pass to convertes param as a dict
df = pd.read_csv(io.StringIO(temp1), sep='\s+',index_col=[0], converters={'col':p2f})
df
Out[149]:
col
index
113 0.34
122 0.50
123 0.32
301 0.12
In [150]:
# check that dtypes really are floats
df.dtypes
Out[150]:
col float64
dtype: object
My percent to float code is courtesy of ashwini's answer: What is a clean way to convert a string percent to a float?