I would like to convert a sentence to an array of one-hot vector. These vector would be the one-hot representation of the alphabet. It would look like the following:
Here's a vectorized approach using NumPy broadcasting to give us a (N,26)
shaped array -
ints = np.fromstring("hello",dtype=np.uint8)-97
out = (ints[:,None] == np.arange(26)).astype(int)
If you are looking for performance, I would suggest using an initialized array and then assign -
out = np.zeros((len(ints),26),dtype=int)
out[np.arange(len(ints)), ints] = 1
Sample run -
In [153]: ints = np.fromstring("hello",dtype=np.uint8)-97
In [154]: ints
Out[154]: array([ 7, 4, 11, 11, 14], dtype=uint8)
In [155]: out = (ints[:,None] == np.arange(26)).astype(int)
In [156]: print out
[[0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0]]