Factorize a column of strings in pandas

后端 未结 1 979
说谎
说谎 2020-11-27 08:00

As the question says, I have a data frame df_original which is quite large but looks like:

        ID    Count   Column 2   Column 3  Column 4
R         


        
相关标签:
1条回答
  • 2020-11-27 09:04
    v
    
    RowX    yes
    RowY     no
    RowW    yes
    RowJ     no
    RowA    yes
    RowR     no
    RowX    yes
    RowY    yes
    RowW    yes
    RowJ    yes
    RowA    yes
    RowR     no
    Name: Column 3, dtype: object
    

    pd.factorize

    1 - pd.factorize(v)[0]
    array([1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0])
    

    np.where

    np.where(v == 'yes', 1, 0)
    array([1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0])
    

    pd.Categorical/astype('category')

    pd.Categorical(v).codes
    array([1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0], dtype=int8)
    
    v.astype('category').cat.codes
    
    RowX    1
    RowY    0
    RowW    1
    RowJ    0
    RowA    1
    RowR    0
    RowX    1
    RowY    1
    RowW    1
    RowJ    1
    RowA    1
    RowR    0
    dtype: int8
    

    pd.Series.replace

    v.replace({'yes' : 1, 'no' : 0})
    
    RowX    1
    RowY    0
    RowW    1
    RowJ    0
    RowA    1
    RowR    0
    RowX    1
    RowY    1
    RowW    1
    RowJ    1
    RowA    1
    RowR    0
    Name: Column 3, dtype: int64
    

    A fun, generalised version of the above:

    v.replace({r'^(?!yes).*$' : 0}, regex=True).astype(bool).astype(int)
    
    RowX    1
    RowY    0
    RowW    1
    RowJ    0
    RowA    1
    RowR    0
    RowX    1
    RowY    1
    RowW    1
    RowJ    1
    RowA    1
    RowR    0
    Name: Column 3, dtype: int64
    

    Anything that is not "yes" is 0.

    0 讨论(0)
提交回复
热议问题