import pandas as pd
path1 = \"/home/supertramp/Desktop/100&life_180_data.csv\"
mydf = pd.read_csv(path1)
numcigar = {\"Never\":0 ,\"1-5 Cigarettes/day\" :1,\"10-
OK, first problem is you have embedded spaces causing the function to incorrectly apply:
fix this using vectorised str
:
mydf['Cigarettes'] = mydf['Cigarettes'].str.replace(' ', '')
now create your new column should just work:
mydf['CigarNum'] = mydf['Cigarettes'].apply(numcigar.get).astype(float)
UPDATE
Thanks to @Jeff as always for pointing out superior ways to do things:
So you can call replace
instead of calling apply
:
mydf['CigarNum'] = mydf['Cigarettes'].replace(numcigar)
# now convert the types
mydf['CigarNum'] = mydf['CigarNum'].convert_objects(convert_numeric=True)
you can also use factorize
method also.
Thinking about it why not just set the dict values to be floats anyway and then you avoid the type conversion?
So:
numcigar = {"Never":0.0 ,"1-5 Cigarettes/day" :1.0,"10-20 Cigarettes/day":4.0}
Version 0.17.0 or newer
convert_objects
is deprecated since 0.17.0
, this has been replaced with to_numeric
mydf['CigarNum'] = pd.to_numeric(mydf['CigarNum'], errors='coerce')
Here errors='coerce'
will return NaN
where the values cannot be converted to a numeric value, without this it will raise an exception