I have a DataFrame with a multiple columns with \'yes\' and \'no\' strings. I want all of them to convert to a boolian dtype. To map one column, I would use
You could use a stack
/unstack
idiom
df.stack().map(dict_map_yn_bool).unstack()
Using @jezrael's setup
df = pd.DataFrame({'nearby_subway_station':['yes','no'], 'Station':['no','yes']})
dict_map_yn_bool={'yes':True, 'no':False}
Then
df.stack().map(dict_map_yn_bool).unstack()
Station nearby_subway_station
0 False True
1 True False
timing
small data
bigger data
You can use applymap:
df = pd.DataFrame({'nearby_subway_station':['yes','no'], 'Station':['no','yes']})
print (df)
Station nearby_subway_station
0 no yes
1 yes no
dict_map_yn_bool={'yes':True, 'no':False}
df = df.applymap(dict_map_yn_bool.get)
print (df)
Station nearby_subway_station
0 False True
1 True False
Another solution:
for x in df:
df[x] = df[x].map(dict_map_yn_bool)
print (df)
Station nearby_subway_station
0 False True
1 True False
Thanks Jon Clements for very nice idea - using replace:
df = df.replace({'yes': True, 'no': False})
print (df)
Station nearby_subway_station
0 False True
1 True False
Some differences if data are no in dict
:
df = pd.DataFrame({'nearby_subway_station':['yes','no','a'], 'Station':['no','yes','no']})
print (df)
Station nearby_subway_station
0 no yes
1 yes no
2 no a
applymap
create None
for boolean
, strings
, for numeric NaN
.
df = df.applymap(dict_map_yn_bool.get)
print (df)
Station nearby_subway_station
0 False True
1 True False
2 False None
map
create NaN
:
for x in df:
df[x] = df[x].map(dict_map_yn_bool)
print (df)
Station nearby_subway_station
0 False True
1 True False
2 False NaN
replace
dont create NaN
or None
, but original data are untouched:
df = df.replace(dict_map_yn_bool)
print (df)
Station nearby_subway_station
0 False True
1 True False
2 False a
I would work with pandas.DataFrame.replace as I think it is the simplest and has built-in arguments to support this task. Also a one-liner solution, as requested.
First case, replace all instances of 'yes' or 'no':
import pandas as pd
import numpy as np
from numpy import random
# Generating the data, 20 rows by 5 columns.
data = random.choice(['yes','no'], size=(20, 5), replace=True)
col_names = ['col_{}'.format(a) for a in range(1,6)]
df = pd.DataFrame(data, columns=col_names)
# Supplying lists of values to what they will replace. No dict needed.
df_bool = df.replace(to_replace=['yes','no'], value=[True, False])
Second case, where you only want to replace in a subset of columns, as described in the documentation for DataFrame.replace. Use a nested dictionary where the first set of keys are columns with values to replace, and values are dictionaries mapping values to their replacements:
dict_map_yn_bool={'yes':True, 'no':False}
replace_dict = {'col_1':dict_map_yn_bool,
'col_2':dict_map_yn_bool}
df_bool = df.replace(to_replace=replace_dict)