So I have a dataframe:
import pandas as pd
df = pd.DataFrame({\'name\': [\'Jason\', \'Molly\', \'Tina\', \'Jake\', \'Amy\'],
\'score\': [1, 3, 4, 5, 2]
The problem is that apply
applies your function to every single value in the column. df
is not a DataFrame
inside of are_you_ok
, but (in your case) an integer. Naturally, Python is complaining that you cannot index into integers with ['happiness']
.
Your code is quite easy to fix, though. Just rewrite are_you_ok
such that it works with integer arguments.
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'],
...: 'score': [1, 3, 4, 5, 2]})
...:
In [3]: def are_you_ok(x):
...: if x >= 4:
...: return 'happy'
...: elif x <= 2:
...: return 'sad'
...: else:
...: return 'ok'
...:
In [4]: df['happiness'] = df['score'].apply(are_you_ok)
In [5]: df
Out[5]:
name score happiness
0 Jason 1 sad
1 Molly 3 ok
2 Tina 4 happy
3 Jake 5 happy
4 Amy 2 sad
Sounds like you want np.select from numpy
import numpy as np
conds = [df.score >=4, df.score <=2]
choices = ['happy', 'sad']
df['happiness'] = np.select(conds, choices, default='ok')
>>> df
name score happiness
0 Jason 1 sad
1 Molly 3 ok
2 Tina 4 happy
3 Jake 5 happy
4 Amy 2 sad
Note: you can avoid explicitly importing numpy
by using pandas.np
(or pd.np
, depending how you imported pandas) instead of just np
Using pd.cut
pd.cut(df.score,[0,2,4,np.Inf],labels=['sad','ok','happy'])
Out[594]:
0 sad
1 ok
2 ok
3 happy
4 sad
#df['yourcol']=pd.cut(df.score,[0,2,4,np.Inf],labels=['sad','ok','happy'])