While doing logistic regression, it is common practice to use one hot vectors as desired result. So, no of classes = no of nodes in output layer
. We don\'t use
https://github.com/scikit-learn-contrib/categorical-encoding
Binary encoding (and in fact, base-anything encoding) is supported in category_encoders. In our case, we end up with feature per place in the binary string, so it's not one feature with value '011' or '010' its 3 with [0, 1, 1] and [0, 1, 0] respectively.
It is fine if you encode with binary. But you probably need to add another layer (or a filter) depending on your task and model. Because your encoding now implicates invalid shared features due to the binary representation.
For example, a binary encoding for input (x = [x1, x2]
):
'apple' = [0, 0]
'orange' = [0, 1]
'table' = [1, 0]
'chair' = [1, 1]
It means that orange
and chair
share same feature x2
. Now with predictions for two classes y
:
'fruit' = 0
'furniture' = 1
And linear optimization model (W = [w1, w2]
and bias b
) for labeled data sample:
(argmin W) Loss = y - (w1 * x1 + w2 * x2 + b)
Whenever you update w2
weights for chair
as furniture
you get a positive improvement of choosing orange
for this class as well. In this particular case, if you add another layer U = [u1, u2]
, you can probably solve it:
(argmin U,W) Loss = y - (u1 * (w1 * x1 + w2 * x2 + b) +
u2 * (w1 * x1 + w2 * x2 + b) +
b2)
Ok, why not avoid this miss representation, by using one-hot encoding. :)