My instinct would have suggested to use .map()
, but I made a comparison between your solution and map, based on a dataframe with 1500 random male/female values.
%timeit df_base['Sex_new'] = df_base['Sex'].map({'male': 0,'female': 1})
1000 loops, best of 3: 653 µs per loop
Edited Based on coldspeeds comment, and because reassigning it is a better comparison with the others:
%timeit df_base['Sex_new'] = df_base['Sex'].replace(['male','female'],[0,1])
1000 loops, best of 3: 968 µs per loop
So actually slower .map()
...!
So based on this example, your 'shoddy' solution seems faster than .map()
...
Edit
pygo's solution:
%timeit df_base['Sex_new'] = np.where(df_base['Sex'] == 'male', 0, 1)
1000 loops, best of 3: 331 µs per loop
So faster!
Jezrael's solution with .astype(int)
:
%timeit df_base['Sex_new'] = (df_base['Sex'] == 'female').astype(int)
1000 loops, best of 3: 388 µs per loop
So also faster than .map()
and .replace()
.