Handling unknown values for label encoding
How can I handle unknown values for label encoding in sk-learn? The label encoder will only blow up with an exception that new labels were detected. What I want is the encoding of categorical variables via one-hot -encoder. However, sk-learn does not support strings for that. So I used a label encoder on each column. My problem is that in my cross-validation step of the pipeline unknown labels show up. The basic one-hot-encoder would have the option to ignore such cases. An apriori pandas.getDummies /cat.codes is not sufficient as the pipeline should work with real-life, fresh incoming data