OneHotEncoder with string categorical values

前端未结

关注

 1  1252

I have the following numpy matrix:

M = [
    [\'a\', 5, 0.2, \'\'],
    [\'a\', 2, 1.3, \'as\'],
    [\'b\', 1, 2.3, \'as\'],
]
M = np.array(M)

相关标签:

1条回答

半阙折子戏

2021-02-06 09:31

You can use DictVectorizer:

from sklearn.feature_extraction import DictVectorizer
import pandas as pd

dv = DictVectorizer(sparse=False) 
df = pd.DataFrame(M).convert_objects(convert_numeric=True)
dv.fit_transform(df.to_dict(orient='records'))

array([[ 5. ,  0.2,  1. ,  0. ,  1. ,  0. ],
       [ 2. ,  1.3,  1. ,  0. ,  0. ,  1. ],
       [ 1. ,  2.3,  0. ,  1. ,  0. ,  1. ]])

dv.feature_names_ holds correspondence to the columns:

[1, 2, '0=a', '0=b', '3=', '3=as']

0 讨论(0)