I am trying to use numpy optimized in-built functions to generate thermometer encoding. Thermometer encoding is basically generating n amount if 1\'s in a given len
In [22]: x = [2, 3, 4, 1, 0, 8]
In [23]: length = 8
In [24]: (np.arange(length) < np.array(x).reshape(-1, 1)).astype(int)
Out[24]:
array([[1, 1, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1]])
Or, create an array of the various lengths of "bars":
In [46]: k = np.arange(length + 1)
In [47]: bars = (k[:-1] < k.reshape(-1, 1)).astype(int)
In [48]: bars
Out[48]:
array([[0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 1, 1, 1]])
and use it as a lookup table:
In [49]: bars[x]
Out[49]:
array([[1, 1, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1]])
In the above code, the preallocated array bars
has shape (length+1, length)
. A more memory efficient representation of bars
can be created using:
In [61]: from numpy.lib.stride_tricks import as_strided
In [62]: u = np.zeros(2*length, dtype=int)
In [63]: u[length:] = 1
In [64]: bars = as_strided(u[length-1:], shape=(length+1, length), strides=(u.strides[0], -u.strides[0]))
In [65]: bars
Out[65]:
array([[0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 0, 0, 0],
[1, 1, 1, 0, 0, 0, 0, 0],
[1, 1, 1, 1, 0, 0, 0, 0],
[1, 1, 1, 1, 1, 0, 0, 0],
[1, 1, 1, 1, 1, 1, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 0],
[1, 1, 1, 1, 1, 1, 1, 1]])
Then bars
is a view of the one-dimensional array u
, and it only uses 2*length
integers.