Training model with multiple features who's values are conceptually the same

落花浮王杯 提交于 2019-12-11 05:29:39

问题


For example, say I am trying to train a binary classifier that takes sample inputs of the form

x = {d=(type of desk), p1=(type of pen on desk), p2=(type of *another* pen on desk)}

Say I then train a model on the samples:

x1 = {wood, ballpoint, gel},      y1 = {0}

x2 = {wood, ballpoint, ink-well}, y2 = {1}.

and try to predict on the new sample: x3 = {wood, gel, ballpoint}. The response that I am hoping for in this case is y3 = {0}, since conceptually it should not matter (ie. I don't want it to matter) which pen is designated as p1 or p2.

When trying to run this model (in my case, using an h2o.ai generated model), I get the error that the category enum for p2 is not valid (since the model has never seen 'ballpoint' in p2's category during training) (in h2o: hex.genmodel.easy.exception.PredictUnknownCategoricalLevelException)

My first idea was to generate permutations of the 'pens' features for each sample to train the model on. Is there a better way to handle this situation? Specifically, in h2o.ai Flow UI solution, since that is what I am using to build the model. Thanks.


回答1:


H2O binary models (models running in the H2O cluster) will handle unseen categorical levels automatically, however, in when you are generating predictions using the pure Java POJO model method (like in your case), this is a configurable option. In the EasyPredictModelWrapper, the default behavior is that unknown categorical levels throw PredictUnknownCategoricalLevelException, which is why you are seeing that error.

There is more info about this in the EasyPredictModelWrapper Javadocs. Here is an example:

The easy prediction API for generated POJO and MOJO models. Use as follows: 1. Instantiate an EasyPredictModelWrapper 2. Create a new row of data 3. Call one of the predict methods

Here is an example:

// Step 1.
modelClassName = "your_pojo_model_downloaded_from_h2o";
GenModel rawModel;
rawModel = (GenModel) Class.forName(modelClassName).newInstance();

EasyPredictModelWrapper model = new EasyPredictModelWrapper(
                                    new EasyPredictModelWrapper.Config()
                                        .setModel(rawModel)
                         .setConvertUnknownCategoricalLevelsToNa(true));

// Step 2.
RowData row = new RowData();
row.put(new String("CategoricalColumnName"), new String("LevelName"));
row.put(new String("NumericColumnName1"), new String("42.0"));
row.put(new String("NumericColumnName2"), new Double(42.0));

// Step 3.
BinomialModelPrediction p = model.predictBinomial(row);


来源:https://stackoverflow.com/questions/45093030/training-model-with-multiple-features-whos-values-are-conceptually-the-same

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!