问题
For example, say I am trying to train a binary classifier that takes sample inputs of the form
x = {d=(type of desk), p1=(type of pen on desk), p2=(type of *another* pen on desk)}
Say I then train a model on the samples:
x1 = {wood, ballpoint, gel}, y1 = {0}
x2 = {wood, ballpoint, ink-well}, y2 = {1}.
and try to predict on the new sample: x3 = {wood, gel, ballpoint}
. The response that I am hoping for in this case is y3 = {0}
, since conceptually it should not matter (ie. I don't want it to matter) which pen is designated as p1 or p2.
When trying to run this model (in my case, using an h2o.ai generated model), I get the error that the category enum for p2
is not valid (since the model has never seen 'ballpoint' in p2
's category during training) (in h2o: hex.genmodel.easy.exception.PredictUnknownCategoricalLevelException)
My first idea was to generate permutations of the 'pens' features for each sample to train the model on. Is there a better way to handle this situation? Specifically, in h2o.ai Flow UI solution, since that is what I am using to build the model. Thanks.
回答1:
H2O binary models (models running in the H2O cluster) will handle unseen categorical levels automatically, however, in when you are generating predictions using the pure Java POJO model method (like in your case), this is a configurable option. In the EasyPredictModelWrapper
, the default behavior is that unknown categorical levels throw PredictUnknownCategoricalLevelException, which is why you are seeing that error.
There is more info about this in the EasyPredictModelWrapper Javadocs. Here is an example:
The easy prediction API for generated POJO and MOJO models. Use as follows: 1. Instantiate an EasyPredictModelWrapper 2. Create a new row of data 3. Call one of the predict methods
Here is an example:
// Step 1.
modelClassName = "your_pojo_model_downloaded_from_h2o";
GenModel rawModel;
rawModel = (GenModel) Class.forName(modelClassName).newInstance();
EasyPredictModelWrapper model = new EasyPredictModelWrapper(
new EasyPredictModelWrapper.Config()
.setModel(rawModel)
.setConvertUnknownCategoricalLevelsToNa(true));
// Step 2.
RowData row = new RowData();
row.put(new String("CategoricalColumnName"), new String("LevelName"));
row.put(new String("NumericColumnName1"), new String("42.0"));
row.put(new String("NumericColumnName2"), new Double(42.0));
// Step 3.
BinomialModelPrediction p = model.predictBinomial(row);
来源:https://stackoverflow.com/questions/45093030/training-model-with-multiple-features-whos-values-are-conceptually-the-same