import pandas as pd
path1 = \"/home/supertramp/Desktop/100&life_180_data.csv\"
mydf = pd.read_csv(path1)
numcigar = {\"Never\":0 ,\"1-5 Cigarettes/day\" :1,\"10-
Try using this function for all problems of this kind:
def get_series_ids(x):
'''Function returns a pandas series consisting of ids,
corresponding to objects in input pandas series x
Example:
get_series_ids(pd.Series(['a','a','b','b','c']))
returns Series([0,0,1,1,2], dtype=int)'''
values = np.unique(x)
values2nums = dict(zip(values,range(len(values))))
return x.replace(values2nums)
OK, first problem is you have embedded spaces causing the function to incorrectly apply:
fix this using vectorised str
:
mydf['Cigarettes'] = mydf['Cigarettes'].str.replace(' ', '')
now create your new column should just work:
mydf['CigarNum'] = mydf['Cigarettes'].apply(numcigar.get).astype(float)
UPDATE
Thanks to @Jeff as always for pointing out superior ways to do things:
So you can call replace
instead of calling apply
:
mydf['CigarNum'] = mydf['Cigarettes'].replace(numcigar)
# now convert the types
mydf['CigarNum'] = mydf['CigarNum'].convert_objects(convert_numeric=True)
you can also use factorize
method also.
Thinking about it why not just set the dict values to be floats anyway and then you avoid the type conversion?
So:
numcigar = {"Never":0.0 ,"1-5 Cigarettes/day" :1.0,"10-20 Cigarettes/day":4.0}
Version 0.17.0 or newer
convert_objects
is deprecated since 0.17.0
, this has been replaced with to_numeric
mydf['CigarNum'] = pd.to_numeric(mydf['CigarNum'], errors='coerce')
Here errors='coerce'
will return NaN
where the values cannot be converted to a numeric value, without this it will raise an exception