OneHotEncoder with string categorical values

前端 未结 1 1252
南旧
南旧 2021-02-06 08:57

I have the following numpy matrix:

M = [
    [\'a\', 5, 0.2, \'\'],
    [\'a\', 2, 1.3, \'as\'],
    [\'b\', 1, 2.3, \'as\'],
]
M = np.array(M)

相关标签:
1条回答
  • 2021-02-06 09:31

    You can use DictVectorizer:

    from sklearn.feature_extraction import DictVectorizer
    import pandas as pd
    
    dv = DictVectorizer(sparse=False) 
    df = pd.DataFrame(M).convert_objects(convert_numeric=True)
    dv.fit_transform(df.to_dict(orient='records'))
    
    array([[ 5. ,  0.2,  1. ,  0. ,  1. ,  0. ],
           [ 2. ,  1.3,  1. ,  0. ,  0. ,  1. ],
           [ 1. ,  2.3,  0. ,  1. ,  0. ,  1. ]])
    

    dv.feature_names_ holds correspondence to the columns:

    [1, 2, '0=a', '0=b', '3=', '3=as']

    0 讨论(0)
提交回复
热议问题