问题
Why I am getting same results for different words?
import keras
keras.__version__
'1.0.0'
import theano
theano.__version__
'0.8.1'
from keras.preprocessing.text import one_hot
one_hot('START', 43)
[26]
one_hot('children', 43)
[26]
回答1:
unicity non-guaranteed in one hot encoding
see one hot keras documentation
回答2:
From the Keras source code, you can see that the words are hashed modulo the output dimension (43, in your case):
def one_hot(text, n,
filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n',
lower=True,
split=' '):
seq = text_to_word_sequence(text,
filters=filters,
lower=lower,
split=split)
return [(abs(hash(w)) % (n - 1) + 1) for w in seq]
So it is very likely that there will be a collision.
来源:https://stackoverflow.com/questions/36591078/one-hot-encoding-giving-same-number-for-different-words-in-keras