As the question says, I have a data frame df_original
which is quite large but looks like:
ID Count Column 2 Column 3 Column 4
R
v
RowX yes
RowY no
RowW yes
RowJ no
RowA yes
RowR no
RowX yes
RowY yes
RowW yes
RowJ yes
RowA yes
RowR no
Name: Column 3, dtype: object
pd.factorize
1 - pd.factorize(v)[0]
array([1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0])
np.where
np.where(v == 'yes', 1, 0)
array([1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0])
pd.Categorical
/astype('category')
pd.Categorical(v).codes
array([1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0], dtype=int8)
v.astype('category').cat.codes
RowX 1
RowY 0
RowW 1
RowJ 0
RowA 1
RowR 0
RowX 1
RowY 1
RowW 1
RowJ 1
RowA 1
RowR 0
dtype: int8
pd.Series.replace
v.replace({'yes' : 1, 'no' : 0})
RowX 1
RowY 0
RowW 1
RowJ 0
RowA 1
RowR 0
RowX 1
RowY 1
RowW 1
RowJ 1
RowA 1
RowR 0
Name: Column 3, dtype: int64
A fun, generalised version of the above:
v.replace({r'^(?!yes).*$' : 0}, regex=True).astype(bool).astype(int)
RowX 1
RowY 0
RowW 1
RowJ 0
RowA 1
RowR 0
RowX 1
RowY 1
RowW 1
RowJ 1
RowA 1
RowR 0
Name: Column 3, dtype: int64
Anything that is not "yes"
is 0
.