How can I one hot encode a list of strings with Keras?

后端 未结 3 470
半阙折子戏
半阙折子戏 2021-01-02 03:27

I have a list:

code = [\'\', \'are\', \'defined\', \'in\', \'the\', \'\"editable\', \'parameters\"\', \'\\n\', \'section.\', \'\\n\', \'A\', \'large         


        
相关标签:
3条回答
  • 2021-01-02 04:11

    instead use

    pandas.get_dummies(y_train)
    
    0 讨论(0)
  • 2021-01-02 04:12

    keras only supports one-hot-encoding for data that has already been integer-encoded. You can manually integer-encode your strings like so:

    Manual encoding

    # this integer encoding is purely based on position, you can do this in other ways
    integer_mapping = {x: i for i,x in enumerate(code)}
    
    vec = [integer_mapping[word] for word in code]
    # vec is
    # [0, 1, 2, 3, 16, 5, 6, 22, 8, 22, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
    

    Using scikit-learn

    from sklearn.preprocessing import LabelEncoder
    import numpy as np
    
    code = np.array(code)
    
    label_encoder = LabelEncoder()
    vec = label_encoder.fit_transform(code)
    
    # array([ 2,  6,  7,  9, 19,  1, 16,  0, 17,  0,  3, 10,  5, 21, 11, 18, 19,
    #         4, 22, 14, 13, 12,  0, 20,  8, 15])
    

    You can now feed this into keras.utils.to_categorical:

    from keras.utils import to_categorical
    
    to_categorical(vec)
    
    0 讨论(0)
  • 2021-01-02 04:12

    Try converting it to a numpy array first:

    from numpy import array

    and then:

    to_categorical(array(code))

    0 讨论(0)
提交回复
热议问题